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Preface to the Second Edition 


There is a lot that is different about this second edition. First, there is a co-author, 
without whose help this revision would not have been possible. Second, we have 
benefited from countless letters from readers and colleagues who have pointed out 
errors and omissions and have made valuable suggestions over the past 25 years. 
These communications make this revision worth the effort. Third, we have tried to 
update the content of the book while striving to preserve the character and spirit of 
the first edition. 
Here are some of the numerous changes that have been made: 


1. 


2. 


The Introduction section has been removed. We have also removed Chapter 
14 on sequential statistical inference. 


Many parts of the book have undergone substantial rewriting. For example, 
Chapter 4 has many changes, such as inclusion of exchangeability. In Chapter 
3 an introduction to characteristic functions has been added, in Chapter 5 
some new distributions have been added, and in Chapter 6 there have been 
many changes in proofs. 


. The statistical inference part of the book (Chapters 8 to 13) has been updated. 


Thus in Chapter 8 we have expanded the coverage of invariance and have 
included discussions of ancillary statistics and conjugate prior distributions. 


. Similar changes have been made in Chapter 9. A new section on locally most 


powerful tests has been added. 


. Chapter 11 has been greatly revised and a discussion of invariant confidence 


intervals has been added. 


. Chapter 13 has been completely rewritten in the light of increased emphasis 


on nonparametric inference. We have expanded the discussion of U statistics. 
Later sections show the connection between commonly used tests and U- 
statistics. 


. In Chapter 12, the notation has been changed to confirm to the current con- 


vention. 


. Many problems and examples have been added. 


xii PREFACE TO THE SECOND EDITION 


9. More figures have been added to illustrate examples and proofs. 
10. Answers to selected problems have been provided. 


We are truly grateful to the readers of the first edition for countless comments and 
suggestions and hope we will continue to hear from them about this edition. Please 
direct your comments to vrohatg @attglobal.net or to saleh @ math.carleton.ca. 

Special thanks are due Ms. Gillian Murray for her superb word processing of the 
manuscript, and Dr. Indar Bhatia for figures that appear in the text. Dr. Bhatia spent 
countless hours preparing the diagrams for publication. We also acknowledge the 
assistance of Dr. K. Selvavel. 


VIJAY К. ROHATGI 
A. K. Md. EHSANES SALEH 


Preface to the First Edition 


This book on probability theory and mathematical statistics is designed for a three- 
quarter course meeting four hours per week or a two-semester course meeting three 
hours per week. It is designed primarily for advanced seniors and beginning grad- 
uate students in mathematics, but it can also be used by students in physics and 
engineering with strong mathematical backgrounds. Let me emphasize that this is a 
mathematics text and not a “cookbook.” It should not be used as a text for service 
courses. 

The mathematics prerequisites for this book are modest. It is assumed that the 
reader has had basic courses in set theory and linear algebra and a solid course in 
advanced calculus. No prior knowledge of probability and/or statistics is assumed. 

My aim is to provide a solid and well-balanced introduction to probability theory 
and mathematical statistics. It is assumed that students who wish to do graduate 
work in probability theory and mathematical statistics will be taking, concurrently 
with this course, a measure-theoretic course in analysis if they have not already had 
one. These students can go on to take advanced-level courses in probability theory 
or mathematical statistics after completing this course. 

This book consists of essentially three parts, although no such formal divisions 
are designated in the text. The first part consists of Chapters 1 through 6, which 
form the core of the probability portion of the course. The second part, Chapters 7 
through 11, covers the foundations of statistical inference. The third part consists of 
the remaining three chapters on special topics. For course sequences that separate 
probability and mathematical statistics, the first part of the book can be used for a 
course in probability theory, followed by a course in mathematical statistics based 
on the second part and, possibly, one or more chapters on special topics. 

The reader will find here a wealth of material. Although the topics covered are 
fairly conventional, the discussions and special topics included are not. Many pre- 
sentations give far more depth than is usually the case in a book at this level. Some 
special features of the book are the following: 


1. A well-referenced chapter on the preliminaries. 


2. About 550 problems, over 350 worked-out examples, about 200 remarks, and 
about 150 references. 
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. An advance warning to readers wherever the details become too involved. 


They can skip the later portion of the section in question on first reading 
without destroying the continuity in any way. 


. Many results on characterizations of distributions (Chapter 5). 
. Proof of the central limit theorem by the method of operators and proof of 


the strong law of large numbers (Chapter 6). 


. А section on minimal sufficient statistics (Chapter 8). 


7. А chapter on special tests (Chapter 10). 


10. 
11. 


. A careful presentation of the theory of confidence intervals, including 


Bayesian intervals and shortest-length confidence intervals (Chapter 11). 


. A chapter on the general linear hypothesis, which carries linear models 


through to their use in basic analysis of variance (Chapter 12). 
Sections on nonparametric estimation and robustness (Chapter 13). 
Two sections on sequential estimation (Chapter 14). 


The contents of this book were used in a one-year (two-semester) course that I 
taught three times at the Catholic University of America and once in a three-quarter 
course at Bowling Green State University. In the fall of 1973 my colleague, Professor 
Eugene Lukacs, taught the first quarter of this same course on the basis of my notes, 
which eventually became this book. I have always been able to cover this book (with 
few omissions) in a one-year course, lecturing three hours a week. An hour-long 
problem session every week is conducted by a senior graduate student. 

In a book of this size there are bound to be some misprints, errors, and ambiguities 
of presentation. I shall be grateful to any reader who brings these to my attention. 


V. K. ROHATGI 


Bowling Green, Ohio 
February 1975 
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and References 


The book is divided into 13 chapters, numbered 1 through 13. Each chapter is divided 
into several sections. Lemmas, theorems, equations, definitions, remarks, figures, and 
so on, are numbered consecutively within each section. Thus Theorem i.j.k refers 
to the kth theorem in Section j of Chapter i, Section i.j refers to the jth section of 
Chapter i, and so on. Theorem j refers to the jth theorem of the section in which it 
appears. A similar convention is used for equations except that equation numbers are 
enclosed in parentheses. Each section is followed by a set of problems for which the 
same numbering system is used. 

References are given at the end of the book and are denoted in the text by numbers 
enclosed in square brackets, [ ]. If a citation is to a book, the notation ([ї, p. Јр) 
refers to the jth page of the reference numbered [i]. 

A word about the proofs of results stated without proof in this book: If a reference 
appears immediately following or preceding the statement of a result, it generally 
means that the proof is beyond the scope of this text. If no reference is given, it 
indicates that the proof is left to the reader. Sometimes the reader is asked to supply 
the proof as a problem. 
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CHAPTER 1 


Probability 


11 INTRODUCTION 


The theory of probability had its origin in gambling and games of chance. It owes 
much to the curiosity of gamblers who pestered their friends in the mathematical 
world with all sorts of questions. Unfortunately, this association with gambling con- 
tributed to very slow and sporadic growth of probability theory as a mathematical 
discipline. The mathematicians of the day took little or no interest in the develop- 
ment of any theory but looked only at the combinatorial reasoning involved in each 
problem. 

The first attempt at some mathematical rigor is credited to Laplace. In his monu- 
mental work, Theorie analytique des probabilités (1812), Laplace gave the classical 
definition of the probability of an event that can occur only in a finite number of 
ways as the proportion of the number of favorable outcomes to the total number of 
all possible outcomes, provided that all the outcomes are equally likely. According 
to this definition, computation of the probability of events was reduced to combina- 
torial counting problems. Even in those days, this definition was found inadequate. 
In addition to being circular and restrictive, it did not answer the question of what 
probability is; it only gave a practical method of computing the probabilities of some 
simple events. 

An extension of the classical definition of Laplace was used to evaluate the prob- 
abilities of sets of events with infinite outcomes. The notion of equal likelihood of 
certain events played a key role in this development. According to this extension, 
if О is some region with a well-defined measure (length, area, volume, etc.), the 
probability that a point chosen at random lies in a subregion A of Q is the ratio 
measure(A)/measure(Q). Many problems of geometric probability were solved us- 
ing this extension. The trouble is that one can define at random in any way one 
pleases, and different definitions lead to different answers. For example, Joseph 
Bertrand, in his book Calcul des probabilités (Paris, 1889), cited a number of prob- 
lems in geometric probability where the result depended on the method of solution. 
In Example 1.3.9 we discuss the famous Bertrand paradox and show that in reality 
there is nothing paradoxical about Bertrand’s paradoxes; once we define probability 
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spaces carefully, the paradox is resolved. Nevertheless, difficulties encountered in 
the field of geometric probability have been largely responsible for the slow growth 
of probability theory and its tardy acceptance by mathematicians as a mathematical 
discipline. 

The mathematical theory of probability as we know it today is of comparatively 
recent origin. It was A. N. Kolmogorov who axiomatized probability in his funda- 
mental work, Foundations of the Theory of Probability (Berlin), in 1933. According 
to this development, random events are represented by sets and probability is just a 
normed measure defined on these sets. This measure-theoretic development not only 
provided a logically consistent foundation for probability theory but also joined it to 
the mainstream of modern mathematics. 

In this book we follow Kolmogorov’s axiomatic development. In Section 1.2 we 
introduce the notion of a sample space. In Section 1.3 we state Kolmogorov’s axioms 
of probability and study some simple consequences of these axioms. Section 1.4 is 
devoted to the computation of probability on finite sample spaces. Section 1.5 deals 
with conditional probability and Bayes rule, and Section 1.6 examines the indepen- 
dence of events. 


1.2 SAMPLE SPACE 


In most branches of knowledge, experiments аге a way of life. In probability and 
statistics, too, we concern ourselves with special types of experiments. Consider the 
following examples. 


Example 1. A coin is tossed. Assuming that the coin does not land on the side, 
there are two possible outcomes of the experiment: heads and tails. On any perfor- 
mance of this experiment, one does not know what the outcome will be. The coin 
can be tossed as many times as desired. 


Example 2. A roulette wheel is a circular disk divided into 38 equal sectors num- 
bered from 0 to 36 and 00. A ball is rolled on the edge of the wheel, and the wheel is 
rolled in the opposite direction. One bets on any of the 38 numbers or some combi- 
nation of them. One can also bet on a color, red or black. If the ball lands in the sector 
numbered 32, say, anybody who bet on 32, or a combination including 32, wins; and 
so on. In this experiment, all possible outcomes are known in advance, namely 00, 
0, 1, 2, ... , 36, but on any performance of the experiment there is uncertainty as to 
what the outcome will be, provided, of course, that the wheel is not rigged in any 
manner. Clearly, the wheel can be rolled any number of times. 


Example 3. A manufacturer produces 12-in rulers. The experiment consists in 
measuring as accurately as possible the length of a ruler produced by the manufac- 
turer. Because of errors in the production process, one does not know what the true 
length of the ruler selected will be. It is clear, however, that the length will be, say, 
between 11 and 13 in., or, if one wants to be safe, between 6 and 18 in. 
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Example 4. The length of life of a light bulb produced by a certain manufacturer 
is recorded. In this case one does not know what the length of life will be for the 
light bulb selected, but clearly one is aware in advance that it will be some number 
between 0 and oo hours. 


The experiments described above have certain common features. For each exper- 
iment, we know in advance all possible outcomes; that is, there are no surprises in 
store after any performance of the experiment. On any performance of the exper- 
iment, however, we do not know what the specific outcome will be; that is, there 
is uncertainty about the outcome on any performance of the experiment. Moreover, 
the experiment can be repeated under identical conditions. These features describe a 
random (or statistical) experiment. 


Definition 1. A random (or statistical) experiment is an experiment in which: 


(a) All outcomes of the experiment are known in advance. 

(b) Any performance of the experiment results in an outcome that is not known 
in advance. 

(c) The experiment can be repeated under identical conditions. 


In probability theory we study this uncertainty of a random experiment. It is con- 
venient to associate with each such experiment a set ©, the set of all possible out- 
comes of the experiment. To engage in any meaningful discussion about the exper- 
iment, we associate with Q a o-field S of subsets of Ф. We recall that a o-field is 
a nonempty class of subsets of 2 that is closed under the formation of countable 
unions and complements and contains the null set Ӯ. 


Definition 2. The sample space of a statistical experiment is a pair (2, S), where 


(a) 2 is the set of all possible outcomes of the experiment. 
(b) S is a o-field of subsets of ©. 


The elements of $2 are called sample points. Any set A € S is known as an 
event. Clearly, A is a collection of sample points. We say that an event A happens 
if the outcome of the experiment corresponds to a point in A. Each one-point set is 
known as a simple or elementary event. If the set $2 contains only a finite number of 
points, we say that (Q, S) is a finite sample space. If © contains at most a countable 
number of points, we call (©, S) a discrete sample space. If, however, © contains 
uncountably many points, we say that (9, S) is an uncountable sample space. In 
particular, if Q = Rg or some rectangle in Rg, we call it a continuous sample space. 


Remark I. Тһе choice of S is an important one, and some remarks are in order. 
If О contains at most a countable number of points, we can always take S to be the 
class of all subsets of Q. This is certainly a o-field. Each one-point set is a member 
of S and is the fundamental object of interest. Every subset of Q is an event. If Q 
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has uncountably many points, the class of all subsets of Q is still a o-field, but it is 
much too large a class of sets to be of interest. One of the most important examples 
of an uncountable sample space is the case in which Q = R or Q is an interval іп R. 
In this case we would like all one-point subsets of Q and all intervals (closed, open, 
or semiclosed) to be events. We use our knowledge of analysis to specify S. We will 
not go into detail here except to recall that the class of all semiclosed intervals (a, b] 
generates a class Bı that is a o-field on R. This class contains all one-point sets and 
all intervals (finite or infinite). We take S = 81. Since we will be dealing mostly 
with the one-dimensional case, we write B instead of B4. There are many subsets 
of R that are not in 1, but we do not demonstrate this fact here. We refer the reader 
to Halmos [39], Royden [94], or Kolmogorov and Fomin [52] for further details. 


Example 5. Let us toss a coin. The set €2 is the set of symbols H and T, where H 
denotes head and T represents tail. Also, S is the class of all subsets of ©, namely, 
UH}, {T}, (Н, T), Ø}. If the coin is tossed two times, then 


Q = ((H, Н), (H, T), (T, H), (T, T)}, 
and 


S = (9, (H, Н)}, (Œ, DJ, (T, H)), (T, D}, (H, Н), H, T)], (H, Н), (Т, H)], 
(ŒH, Н), (T, D), (Н, T), (T, H)), {(T, D), (T, H)), {(Т, T), 

(Н, T)}, {(Н, Н), (Н, T), (T. H)), ((H, Н), (Н, Т), (T, T)), 

{(H, Н), (Т, Н), (T, T)), ((H, T), (T, Н), (T, T], 9), 


where the first element of a pair denotes the outcome of the first toss, and the second 
element, the outcome of the second toss. The event at least one head consists of 
sample points (H, H), (H, T), (T, H). The event at most one head is the collection of 
sample points (H, T), (T, H), (T, T). 


Example 6. A die is rolled n times. The sample space is the pair (©, S), where 
Q is the set of all n-tuples (x1, x2, ... , Xn), x; € {1,2, 3, 4,5,6}, = 1, 2,...,п, 
and S is the class of all subsets of $2. Q contains 6" elementary events. The event A 
that 1 shows at least once is the set 
A= {(x1,X2,... , Xn): at least one of x;’s is 1} 
= Q — {(x1, x2,... , Xn): none of the x;’s is 1} 


= Q — {(x1, X2, ... , Xn): X; E {2, 3,4, 5, 6}, i = 1,2,...,n}. 
Example 7. A coin is tossed until the first head appears. Then 


Q = {H, (Т, Н), (Т, Т, H), (Т, Т, Т, H,...), 
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and S is the class of all subsets of Q. An equivalent way of writing $2 would be to 
look at the number of tosses required for the first head. Clearly, this number can take 
values 1, 2, 3,... , so that Q is the set of all positive integers. Thus S is the class of 
all subsets of positive integers. 


Example 8. Consider a pointer that is free to spin about the center of a circle. 
If the pointer is spun by an impulse, it will finally come to rest at some point. On 
the assumption that the mechanism is not rigged in any manner, each point on the 
circumference is a possible outcome of the experiment. The set © consists of all 
points 0 < x < 2лг, where r is the radius of the circle. Every one-point set {x} is 
a simple event, namely, that the pointer will come to rest at x. The events of interest 
are those in which the pointer stops at a point belonging to a specified arc. Here S is 
taken to be the Borel o -field of subsets of [0, 27rr). 


Example 9. A rod of length / is thrown onto a flat table, which is ruled with 
parallel lines at distance 2/. The experiment consists in noting whether or not the rod 
intersects one of the ruled lines. 

Let r denote the distance from the center of the rod to the nearest ruled line, and 
let 0 be the angle that the axis of the rod makes with this line (Fig. 1). Every outcome 
of this experiment corresponds to a point (r, @) in the plane. As 22 we take the set of 
all points (r, 0) іл {(7, 6): 0 xr x LO x 0 < л}. For S we take the Borel o -field, 
B2, of subsets of 2, that is, the smallest o -field generated by rectangles of the form 


{х,у)з:а<х<Ь,с<у<4,0<а<Ь<1,0<с<4<лх}|. 





Fig. 1. 
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Clearly, the rod will intersect a ruled line if and only if the center of the rod lies in 
the area enclosed by the locus of the center of the rod (while one end touches the 
nearest line) and the nearest line (shaded area in Fig. 2). 


Remark 2. From the discussion above it should be clear that in the discrete case 
there is really no problem. Every one-point set is also an event, and S is the class of 
all subsets of ©. The problem, if there is any, arises only in regard to uncountable 
sample spaces. The reader has to remember only that in this case not all subsets of 
Q are events. The case of most interest is the опе in which Q = Rg. In this case 
roughly all sets that have a well-defined volume (or area or length) are events. Not 
every set has the property in question, but sets that lack it are not easy to find and 
one does not encounter them in practice. 


PROBLEMS 1.2 


1. A club has five members, A, B, C, D, and E. It is required to select a chairman 
and a secretary. Assuming that one member cannot occupy both positions, write 
the sample space associated with these selections. What is the event that member 
A is an officeholder? 


2. In each of the following experiments, what is the sample space? 


(a) In a survey of families with three children, the genders of the children are 
recorded in increasing order of age. 


(b) The experiment consists of selecting four items from a manufacturer's output 
and observing whether or not each item is defective. 
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(c) A given book is opened to any page, and the number of misprints is counted. 


(d) Two cards are drawn from an ordinary deck of cards (i) with replacement, 
and (ii) without replacement. 


3. Let A, B, C be three arbitrary events on a sample space (©, S). What is the event 
that only A occurs? What is the event that at least two of A, B, C occur? What is 
the event that both A and C, but not B, occur? What is the event that at most one 
of A, B, C occurs? 
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Let (©, S) be the sample space associated with a statistical experiment. In this sec- 
tion we define a probability set function and study somé of its properties. 


Definition 1. Let (©, S) be a sample space. A set function P defined on S is 
called a probability measure (or simply, probability) if it satisfies the following con- 
ditions: 


(i) P(A) > Oforall A eS. 
(ii) P(Q) = 1. 
(iii) Let {Aj}, Aj € S, j = 1,2,..., be a disjoint sequence of sets; that is, 
Aj П Ак = Ø for j 5 k, where f) is the null set. Then 


(1) o(Sa,) = } P(Aj), 
j=l j=! 


where we have used the notation ey Aj to denote union of disjoint sets 
A; 
p 


We call P(A) the probability of event A. If there is no confusion, we will write 
P A instead of P(A). Property (iii) is called countable additivity. That PØ = 0 and 
P is also finitely additive follows from it. 


Remark 1. If Q is discrete and contains at most n (< oo) points, each single- 
point set (oj), j = 1,2,... , n, is an elementary event, and it is sufficient to assign 
probability to each {wj}. Then if A € S, where S is the class of all subsets of ©, 
РА = } „сд Pio). One such assignment is the equally likely assignment or the 
assignment of uniform probabilities. According to this assignment, P{w;} = 1/n, 
j —1,2;,...,n. Thus PA = т/п if A contains m elementary events, 1 < m < n. 


Remark 2. f Q is discrete and contains a countable number of points, one can- 
not make an equally likely assignment of probabilities. It suffices to make the assign- 
ment for each elementary event. If A є S, where S is the class of all subsets of Q, 
define PA = У „са Plo). 
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Remark 3. If © contains uncountably many points, each one-point set is ап ele- 
mentary event, and again one cannot make an equally likely assignment of probabili- 
ties. Indeed, one cannot assign positive probability to each elementary event without 
violating the axiom PQ = 1. In this case one assigns probabilities to compound 
events consisting of intervals. For example, if Q = [0, 1] and S is the Borel o-field 
of all subsets of ©, the assignment P[7] = length of 7, where I is a subinterval of 
9, defines a probability. 


Definition 2. The triple (2, S, P) is called a probability space. 


Definition 3. Let A є S. We say that the odds for A are a to b if PA =a/(a+b), 
and then the odds against A are b to a. 


In many games of chance, probability is often stated in terms of odds against an 
event. Thus in horse racing a two-dollar bet on a horse to win with odds of 2 to 1 
(against) pays approximately six dollars if the horse wins the race. In this case the 
probability of winning is 5. 


Example І. Let us toss a coin. The sample space is (Q, S), where © = (H, T] 
and S is the c -field of all subsets of Q. Let us define P on S as follows: 


P{H}=5 and Р{Т}= 5. 


Then Р clearly defines a probability. Similarly, P{H} = 3, Р{Т} = 4, and P{H} = 
1, P(T) = 0 are probabilities defined on S. Indeed, 


P{H}=p and P{T}=1-~p (Ox<p<)) 
defines a probability on (©, S). 


Example 2. Let О = (1, 2,3, ...} be the set of positive integers, and let S be the 
class of all subsets of 9. Define P on S as follows: 


xcd А 
Pil = 5 і = 1,2,.... 


Then 37/2, P{i} = 1, and P defines a probability. 


Example 3. Let © = (0, оо) and S = 8, the Borel o-field on Q. Define P as 
follows: For each interval J C ©, 


РІ = fe dx. 
1 


Clearly, PI > 0, PQ = 1, and P is countably additive by properties of integrals. 
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Theorem 1. P is monotone and subtractive; that is, if A, B є S and A C В, then 
PA < PB and P(B — А) = PB — PA, where B — А = BNA‘, А being the 
complement of the event A. 

Proof. |f A С B, then 

В = (АС В) +(В— А)= А + (В – А), 
and it follows that PB = PA + P(B — A). 

Corollary. For ali A € 5,0 < PA < 1. 

Remark 4. We wish to emphasize that if PA = 0 for some A € S, we call A an 
event with zero probability or a null event. However, it does not follow that A = @. 
Similarly, if PB = 1 for some В є S, we call B a certain event, but it does not 
follow that B = Q. 

Theorem 2 (Addition Rule). If A, B є S, then 
(2) P(AUB) = РА + PB – Р(А П В). 

Proof. Clearly, 

AUB = (A— B)+(B—A)+(ANB) 
and 
A=(ANB)+(A— В), B — (АП B) + (B — A). 
The result follows by countable additivity of P. 
Corollary 1. P is subadditive, that is, if A, B € S, then 


(3) P(AUB) < РА + PB. 


Corollary 1 can be extended to an arbitrary number of events Aj, 
(4) (Ua) = Ура, 
j j 
Corollary 2. If B = A^, then A and B are disjoint and 


(5) PA=1— РА. 


The following generalization of (2) is left as an exercise. 
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Theorem 3 (Principle of Inclusion—Exclusion). Let A1, 42,..., A, € S. 
Then 


n 


(6) "(ü 4) - Y, PA, — У? P(A N An) 
К=1 К=1 


ky <k 


n 
+ У) РА ПА, Ag) 
ky < «ka 


seco ((} м). 


k=1 


Example 4. A die is rolled twice. Let all the elementary events in Q = {(i, j): 
i,j = 1,2,..., 6} be assigned the same probability. Let A be the event that the 
first throw shows a number < 2, and B be the event that the second throw shows at 
least 5. Then 

A={i,j): 1<i<2, ј =1,2,..., 6}, 
B={G,j):5< j <6, i=1,2,... , 6}, 
АПВ = {(1, 5), (1, б), (2, 5), (2, 6)}; 
and 
P(AU В) = РА + PB — Р(А П В) 
11,1. 4.5 
=3+3- 375 
Example 5. A coin is tossed three times. Let us assign equal probability to each 


of the 2? elementary events in Q. Let A be the event that at least one head shows up 
in three throws. Then 


P(A) = 1 — P(A‘) 
= 1 — P (no heads) 
= 1— P(TTT) = 1. 


We next derive two useful inequalities. 


Theorem 4 (Bonferroni’s Inequality). Given n (> 1) events A1, A2, ... , An, 


(7) ура Улпа < P( Ai) «Y PA. 
i=l i=l i=l 


i<j 
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Proof. In view of (4), it suffices to prove the left side of (7). The proof is by 
induction. The inequality on the left is true for n = 2 since 


PA, + РА» — P(A, A2) = P(A, U А»). 


For n = 3, 


3 3 
50а) = РА: – 3 P(A; N Ay) + Р(А N Ал П Аз), 
ї==1 i= 


i<j 


and the result holds. Assuming that (7) holds for 3 « m < n — 1, we show that it 
also holds for m + 1: 


ш e 
- (Dn) nere (Ca) 


m4l m 
> >. РА; — Y Р(А; N Aj) — „(Оа N Anso) 


i<j i=} 
mil 
> PA -Y Pa: N Aj) – Y Pa; N Am+1) 
і=1 i<j i=] 
m+] m+1 
= РА: — $5 P(A; N Aj). 
i=l i<j 


Theorem 5 (Boole’s Inequality). For any two events A and B, 


(8) P(A N B) > 1— PA‘ — РВ“. 
Corollary 1. Let {A;}, j = 1, 2,... , be a countable sequence of events; then 
(9) P(NAj) > 1 – Y P(A5). 
Proof. Take 
со 
B=()4; ad A=A, 
j-2 


in (8). 
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Corollary 2 (Implication Rule). If A, В, С є S and A and B imply C, then 
(10) PCS < PAS + РВ“. 


Let {A,} be a sequence of sets. The set of all points о € © that belong to A, 
for infinitely many values of n is known as the limit superior of the sequence and is 
denoted by 

limsup A, or lim An. 
noo hores 


The set of all points that belong to A, for all but a finite number of values of n is 
known as the limit inferior of the sequence {An} and is denoted by 


lim inf A, or lim A,. 
п-» оо п->со 


If 


lim A, = im. An, 
noo 


we say that the limit exists and write lim, со An for the common set and call it the 
limit set. 
We have 


co со оо 
im -Uf «ef YU = lim An. 


коо n-lk-n 


i ic 


If the sequence {An} is such that A, С A441, for n = 1, 2,..., it is called nonde- 
creasing; if A, 2 A444, n = 1, 2,..., itis called nonincreasing. If the sequence A, 
is nondecreasing, we write A, J; if A, is nonincreasing, we write A, }. Clearly, if 
An ¥ or An }, the limit exists and we have 


oo 
lim A, — UA if An Y 
п п=1 
апа 
oo 
limA, = (А, А, 7. 
п 
п=1 


Theorem 6. Let {А„} be a nondecreasing sequence of events in S; that is, А, € 
S,n=1,2,..., and 


Án 2 An-1, nc23:3: м; 
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Then 
oo 
(11) lim PA, = P( tim. An) = (©) ^) 
Proof. Let 
oo 
A=(JAj. 

j=l 

Then 


со 
A = An + X Aja А). 


j=n 


By countable additivity we have 


со 
РА = РА, +9 P(Aj+i — Ap. 
jen 
and letting n — oo, we see that 


oo 
РА = lim PAn + lim УРСА — Ap. 
ј=п 


The second term ор the right tends to zero as п — oo since the sum Y 21 P(Aj41— 
Aj) x 1 and each summand is nonnegative. The result follows. 


Corollary. Let { А„} be a nonincreasing sequence of events in S. Then 
oo 
(12) Jim PA, — P( lim. An) = (A ^) | 
Proof. Consider the nondecreasing sequence of events {А }. Then 
со 
i € € |. АС 
Jim An = U 45 = А 
It follows from Theorem 6 that 


маны) ШЕ |+ 
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In other words, 
lim (1 — PA,) = 1 — PA, 
noo 
as asserted. 


Remark 5. Theorem 6 and its corollary will be used quite frequently in subse- 
quent chapters. Property (11) is called the continuity of P from below, and (12) is 
known as the continuity of P from above. Thus Theorem 6 and its corollary assure 
us that the set function P is continuous from above and below. 


We conclude this section with some remarks concerning the use of the word ran- 
dom in this book. In probability theory random has essentially three meanings. First, 
in sampling from a finite population, a sample is said to be a random sample if at 
each draw all members available for selection have the same probability of being 
included. We discuss sampling from a finite population in Section 1.4. Second, we 
speak of a random sample from a probability distribution. This notion is formal- 
ized in Section 7.2. The third meaning arises in the context of geometric probability, 
where statements such as “а point is chosen randomly from the interval (a, b)" and 
“a point is picked randomly from a unit square" are frequently encountered. Once we 
have studied random variables and their distributions, problems involving geometric 
probabilities may be formulated in terms of problems involving independent uni- 
formly distributed random variables, and these statements can be given appropriate 
interpretations. 

Roughly speaking, these statements involve a certain assignment of probability. 
The word random expresses our desire to assign equal probability to sets of equal 
lengths, areas, or volumes. Let Q C Rp be a given set, and A be a subset of Q. We 
are interested in the probability that a randomly chosen point in Q falls in A. Here 
randomly chosen means that the point may be any point of © and that the probability 
of its falling in some subset A of Q is proportional to the measure of A (independent 
of the location and shape of A). Assuming that both A and © have well-defined finite 
measures (length, area, volume, etc.), we define 


= measure(A) 
^ measure(Q2) 


[In the language of measure theory we are assuming that Q is a measurable subset of 
Rn that has a finite, positive Lebesque measure. If A is any measurable set, PA = 
и(А)/ и), where и. is the n-dimensional Lebesque measure.] Thus, if a point is 
chosen at random from the interval (a, b), the probability that it lies in the interval 
(c,d) a <c «d x b,is(d—c)/(b—a). Moreover, the probability that the randomly 
selected point lies in any interval of length (d — c) is the same. 

We present some examples. 


Example 6. A point is picked “at random" from a unit square. Let Q = ((x, y): 
0 <х < 1,0 < у < 1}. It is clear that all rectangles and their unions must be in 
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| 
| 
i 
i 


(0.1) EI ‚ (1,1) 





(0,0). pc " 
Fig. 1. A = ((x,y):0<x<4,}<y< I). 


$; so, too, should be all circles in the unit square, since the area of a circle is also 
well defined. Indeed, every set that has a well-defined area has to be in S. We choose 
S = By, the Borel o-field generated by rectangles іп Ф. As for the probability 
assignment, if A € S, we assign PA to A, where PA is the area of the set A. If 
А = {(x,y):0<x< 4,1 < у < 1}, then РА = 1. If B is a circle with center 
(4, 2) and radius 2, then РВ = zx (3)? = 7/4. If C is the set of all points that are at 
most a unit distance from the origin, then PC = 7/4 (see Figs. 1 to 3). 


(0,1) 





B 





(0,0)  —— hc. ni „Бы: 


Fig.2. В = (б, у): &- DG - 1) = 10). 
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(0,0) (1,0) x 





Fig.3. C = (x, y): x? d y? < 1). 


Example 7 (Buffon's Needle Problem). We return to Example 1.2.9. A needle 
(rod) of length / is tossed at random on a plane that is ruled with a series of parallel 
lines a distance 2/ apart. We wish to find the probability that the needle will intersect 
one of the lines. Denoting by r the distance from the center of the needle to the closest 
line and by 0 the angle that the needle forms with this line, we see that a necessary 
and sufficient condition for the needle to intersect the line is that < (1/2) sin Ө. The 
needle will intersect the nearest line if and only if its center falls in the shaded region 
in Fig. 1.2.2. We assign probability to an event А as follows: 


area of set A 


РА = 
in 
Thus the required probability is 
1 f*i1 1 
— zsin0d0 = —. 
Іл Jo 2 л 


Here we have interpreted at random to mean that the position of the needle is char- 
acterized by a point (r, Ө) which lies in the rectangle 0 < r <1,0 < 0 < л. We 
have assumed that the probability that the point (г, Ө) lies in any arbitrary subset of 
this rectangle is proportional to the area of this set. Roughly, this means that “all po- 
sitions of the midpoint of the needle are assigned the same weight and all directions 
of the needle are assigned the same weight." 


Example 8. An interval of length 1, say (0, 1), is divided into three intervals by 
choosing two points at random. What is the probability that the three line segments 
form a triangle? 

It is clear that a necessary and sufficient condition for the three segments to form 
a triangle is that the length of any one of the segments be less than the sum of the 
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other two. Let x, y be the abscissas of the two points chosen at random. Then we 
must have either 


<y<1 and y-x«l 


NI 


О<х < 


or 


<х<1 апа x-y<5. 


NI 


O<y< 


This is precisely the shaded area in Fig. 4. It follows that the required probability 
el 
1S T 
If it is specified in advance that the point x is chosen at random from (0, 4), and 
the point y at random from d. 1), we must have 


0«x«1, #<у<1, 


апа 
y-x<x+l—-—y or 2(y-x)«l. 
In this case the area bounded by these lines is the shaded area in Fig. 5, and it follows 


that the required probability is 2. 
Note the difference in sample spaces in the two computations made above. 








(1,0) x 





(0,0) — 


Fig. 4. (x,y): 0 < x < l < y < 1, and (у – х) < sor0 < у <5 « x < 1, and 


(x — у) < 3). 
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——— aN 


B 


X 


Fig. 5. {(x, y):0<x < 1, 5 < y < L and 2(y — x) < 1}. 


Example 9 (Bertrand's Paradox). A chord is drawn at random in the unit cir- 
cle. What is the probability that the chord is longer than the side of the equilateral 
triangle inscribed in the circle? 

We present here three solutions to this problem, depending on how we interpret 
the phrase at random. The paradox is resolved once we define the probability spaces 
carefully. 


SOLUTION 1. Sincethe length of a chord is uniquely determined by the position 
of its midpoint, choose a point C at random in the circle and draw a line through C 
and O, the center of the circle (Fig. 6). Draw the chord through C perpendicular to 
the line OC. If l; is the length of the chord with C as midpoint, / > МЗ if and only 
if C lies inside the circle with center O and radius 1. Thus РА = m(5)? /л = 1. 

In this case Q is the circle with center О and radius 1, and the event A is the 
concentric circle with center O and radius 1. S is the usual Borel o-field of subsets 
of О. 


SOLUTION 2. Because of symmetry, we may fix one endpoint of the chord at 
some point P and then choose the other endpoint Р! at random. Let the probability 
that Р, lies on an arbitrary arc of the circle be proportional to the length of this arc. 
Now the inscribed equilateral triangle having P as one of its vertices divides the 
circumference into three equal parts. A chord drawn through P will be longer than 
the side of the triangle if and only if the other endpoint Р (Fig. 7) of the chord lies 
on that one-third of the circumference that is opposite P. It follows that the required 
probability is 1. In this case © = [0, 27], S = Bı N Q, and A = [2л /3, 41/3]. 
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Fig. 6. 





Fig. 7. 


SOLUTION 3. Note that the length of a chord is determined uniquely by the 
distance of its midpoint from the center of the circle. Due to the symmetry of the 
circle, we assume that the midpoint of the chord lies on a fixed radius, O M, of the 
circle (Fig. 8). The probability that the midpoint M lies in a given segment of the 
radius through M is then proportional to the length of this segment. Clearly, the 
length of the chord will be longer than the side of the inscribed equilateral triangle if 
the length of OM is less than radius/2. \t follows that the required probability is 1. 
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Fig. 8. 


PROBLEMS 1.3 


1. Let Q be the set of all nonnegative integers and S the class of all subsets of Q. 
In each of the following cases, does P define a probability on (Q, S)? 


(a) For A є <, let 


—À3x 
e ^X 
PA=) ЖО” А > 0. 
хєА 





(b) For A є S, let 


РА = 3 p(i— py, О<р<1. 


хєА 
(с) For А є S, let РА = 1 if A has a finite number of elements, and РА = 0 
otherwise. 


2. Let О = Rand S = B. In each of the following cases, does P define a proba- 
bility on (©, S)? 


(a) For each interval /, let 


)-[. l dx 
Е pm 14x? | 
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3, 


4 


9. 
10. 


11. 


12 


+ 


14 


(b) For each interval J, let PZ = 1 if Z is an interval of finite length, and РІ = 0 
if I is an infinite interval. 

(c) For each interval 7, let РІ = 0 if I С (—oo,1) and РІ = f, dx if 
I € [1,09]. [If 7 = I4 + h, where Д € (—oo, 1) and № С [1, oo), then 
РІ = Ph] 

Let А and В be two events such that В D> A. What is P(A U B)? What is 

Р(А П B)? What is P(A — B)? 


In Problem 1(a) and (b), let A = {all integers > 2), B = {all nonnegative 
integers < 3}, and С = {all integers x, 3 < х < 6}. Find PA, PB, PC, 
P(AN B), P(A U B), P(B U C), P(ANC), and P(B C). 


. In Problem 2(a), let A be the event A = (x: x > 0}. Find PA. Also find 


P{x: x > 0). 


A box contains.1000 light bulbs. The probability that there is at least 1 defective 
bulb in the box is 0.1, and the probability that there are at least 2 defective bulbs 
is 0.05. Find the probability in each of the following cases: 


(a) The box contains no defective bulbs. 
(b) The box contains exactly 1 defective bulb. 
(c) The box contains at most 1 defective bulb. 


. Two points are chosen at random on a line of unit length. Find the probability 


that each of the three line segments so formed will have a length > 1. 


Find the probability that the sum of two randomly chosen positive numbers (both 
< 1) will not exceed 1 and that their product will be < 2. 


Prove Theorem 3. 


Let {А„} be a sequence of events such that A, — А as n — oo. Show that 
PA, > РА аѕп > oo. 


The base and altitude of a right triangle are obtained by picking points randomly 
from [0, a] and [О, 5], respectively. Show that the probability that the area of the 
triangle so formed will be less than ab/4 is (1 + In 2)/2. 


A point X is chosen at random on a line segment AB. (a) Show that the proba- 
bility that the ratio of lengths AX/BX is smaller than a (a > 0) is a/(1 +). 
(b) Show that the probability that the ratio of the length of the shorter segment 
to that of the larger segment is less than i is j. 


COMBINATORICS: PROBABILITY ON FINITE SAMPLE SPACES 


In this section we restrict attention to sample spaces that have at most a finite number 
of points. Let 2 = (01, @2,...,@n} and S be the a-field of all subsets of ©. For 
any A € $, 
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РА = У? Ploj). 


wjEÅ 


Definition 1. An assignment of probability is said to be equally likely (or uni- 
form) if each elementary event in Q is assigned the same probability. Thus, if Q 
contains n points wj, P{@;} = 1/n, j = 1,2,...,n. 


With this assignment 


number of elementary events in A 
— total number of elementary events in Q` 


(1) 
Example І. A coin is tossed twice. The sample space consists of four points. Un- 
der the uniform assignment, each of four elementary events is assigned probability 1. 


Example 2. Three dice are rolled. The sample space consists of 6° points. Each 
one-point set is assigned probability 1/6?. 


In games of chance we usually deal with finite sample spaces where uniform prob- 
ability is assigned to all simple events. The same is the case in sampling schemes. In 
such instances the computation of the probability of an event A reduces to a combi- 
natorial counting problem. We therefore consider some rules of counting. 


Rule 1. Given a collection of n; elements a11, 412, ... , à15,, n2 elements a21, 
422, ... , (2n, and so on, up to ng elements Ар, 342, ... , Akn,, itis possible to form 
nyonge ny ordered k-tuples (a1 j,, а2ь.... , аку) containing one element of each 


kind, 1 < jj <nj,i=1,2,...,k. 


Example 3. Here r distinguishable balls are to be placed in n cells. This amounts 
to choosing one cell for each ball. The sample space consists of n” r-tuples 
(it, i2, ... , ip), where i; is the cell number of the jth ball, j = 1,2,...,r 
(1x ij € n). 

Consider r tossings with a coin. There аге 2" possible outcomes. The probability 
that no heads will show up in r throws is dy . Similarly, the probability that no 6 


will turn up in r throws of a die is (4 . 


Rule 2 is concerned with ordered samples. Consider a set of n elements a1, a2, 
-«. , dn. Any ordered arrangement (ai, , aj,, ... , ai,) of r of these n symbols is called 
an ordered sample of size г. If elements are selected one by one, there are two pos- 
sibilities: 


1. Sampling with replacement. In this case repetitions are permitted, and we can 
draw samples of an arbitrary size. Clearly, there are n" samples of size r. 
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2. Sampling without replacement. In this case an element once chosen is not 
replaced, so that there can be no repetitions. Clearly, the sample size cannot 
exceed п, the size of the population. There are n(n — 1)---(n—r+1) = „Р,, 
say, possible samples of size r. Clearly, „ Р, = 0 for integers r > n. If r = п, 
then „Р, = n!. 


Rule2. If ordered samples of size r are drawn from a population of n elements, 
there are n” different samples with replacement and „Р, samples without replace- 
ment. 


Corollary. The number of permutations of n objects is n!. 


Remark 1. Ме frequently use the term random sample in this book to describe 
the equal assignment of probability to all possible samples in sampling from a finite 
population. Thus, when we speak of a random sample of size r from a population of 
n elements, it means that in sampling with replacement, each of n" samples has the 
same probability 1/n" or that in sampling without replacement, each of „ P, samples 
is assigned probability 1/, P,. 


Example 4. Consider a set of n elements. A sample of size r is drawn at random 
with replacement. Then the probability that no element appears more than once is 
clearly „ Р, /n" . 

Thus, if n balls are to be randomly placed in n cells, the probability that each cell 
will be occupied is n!/n". 


Example 5. Consider a class of r students. The birthdays of these r students form 
a sample of size r from the 365 days in the year. Then the probability that all r 
birthdays are different is 365 P, /(365)’. One can show that this probability is < 1 if 
r= 23. 

The following table gives the values of д, = 365 Р, /(365)" for some selected 
values of r. 


r 20 23 25 30 35 60 
qr | 0.589 0.493 0.431 0.294 0.186 0.006 


Next suppose that each of the r students is asked for his or her birth date in order, 
with the instruction that as soon as a student hears his or her birth date the student 
is to raise a hand. Let us compute the probability that a hand is first raised when the 
kth (k = 1,2, ... , r) student is asked his or her birth date. Let p; be the probability 
that the procedure terminates at the kth student. Then 


Sj 364 r—1l 
йа 365 
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_ 365Pi-1 (, Et | 365 —k y Exo 
Pk = (365)é=1 365 бск) fp 4223-97 


Example 6. Let Q be the set of all permutations of objects. Let A; be the set of 
all permutations that leave the ith object unchanged. Then the set U?_, A; is the set 
of permutations with at least one fixed point. Clearly, 


(n — 1)! 
nh C 


(n — 2)! 


п! C 


PA; = 





=Й Per He 


P(Ain Aj) = 





i<j; i,j=1,2,...,n, etc. 


By Theorem 1.3.3 we have 


(Оа) - (343-1) 


As an application, consider an absentminded secretary who places n letters in n 
envelopes at random. Then the probability that he or she will misplace every letter is 


It is easy to see that this last probability — e~! = 0.3679 as n — oo. 


Rule 3. There are C ) different subpopulations of size r < n from a population 


of n elements, where 


n n! 
Q (acá 


Example 7. Consider the random distribution of r balls in n cells. Let Aj be 
the event that a specified cell has exactly k balls, k = 0,1,2,...,r; k balls can 


be chosen in () ways. We place k balls in the specified cell and distribute the 


remaining r — k balls in the n — 1 cells in (n — 1)’~* ways. Thus 


(rin) (ry (1 IR 
e CIE 
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Example 8. There are (5) = 635,013,559,600 different hands at bridge and 


52 
= 2,598,960 hands at poker. 


The probability that all 13 cards in a bridge hand have different face values is 


13 (?2 
4 |) 


The probability that a hand at poker contains five different face values is 
13 45/ 52 
5 5J 


Rule 4. Consider a population of n elements. The number of ways in which the 


population can be partitioned into k subpopulations of sizes rj, r2, ... , ry, respec- 
tively, r1 +ro+--- +r, =n, 0 < ri <n, is given by 
n n! 
(3) гуло БЕСИН, 
T1,72,.-- 5% ri! r2! erg! 


The numbers defined in (3) are known as multinomial coefficients. 


Proof. For the proof of Rule 4, one uses Rule 3 repeatedly. Note that 


© QUO 
7T1,72,... , Fk Ti r2 Fk—1 


Example 9. In a game of bridge the probability that a hand of 13 cards contains 
2 spades, 7 hearts, 3 diamonds, and 1 club is 


13\ /13Y /13Y {13 
2 7 3 1 
() —— 
13 
Example 10. Ап urn contains 5 red, 3 green, 2 blue, and 4 white balls. A sample 


of size 8 is selected at random without replacement. The probability that the sample 
contains 2 red, 2 green, 1 blue, and 3 white balls is 


QUO) 
(2) 


1. How many different words can be formed by permuting letters of the word Mis- 
sissippi? How many of these start with the letters Mi? 
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. An urn contains R red and W white marbles. Marbles are drawn from the urn 


one after another without replacement. Let Ag be the event that a red marble is 
drawn for the first time on the kth draw. Show that 


R к R 
PA, = ———__—_—— 1- ——————— |. 
k cx (wj) 


Let p be the proportion of red marbles in the urn before the first draw. Show that 
P Ay — p(1 — р)! as R + W — оо. Is this to be expected? 


In a population of N elements, А are red and W = N — R are white. A group of 
n elements is selected at random. Find the probability that the group so chosen 
will contain exactly r red elements. 


. Each permutation of the digits 1, 2, 3, 4, 5, 6 determines a six-digit number. If 


the numbers corresponding to all possible permutations are listed in increasing 
order of magnitude, find the 319th number on this list. 


. The numbers 1,2,... ‚п are arranged in random order. Find the probability that 


the digits 1, 2, ... , k (k < n) appear as neighbors in that order. 


A pinball table has seven holes through which a ball can drop. Five balls are 
played. Assuming that at each play a ball is equally likely to go down any one of 
the seven holes, find the probability that more than one ball goes down at least 
one of the holes. 


. If 2n boys are divided into two equal subgroups, find the probability that the two 


tallest boys will be (a) in different subgroups, and (b) in the same subgroup. 


. Ina movie theater that can accommodate п +k people, n people are seated. What 


is the probability that r < n given seats are occupied? 


Waiting in line for a Saturday morning movie show are 2n children. Tickets are 
priced at a quarter each. Find the probability that nobody will have to wait for 
change if before a ticket is sold to the first customer, the cashier has 2k (k « n) 
quarters. Assume that it is equally likely that each ticket is paid for with a quarter 
or a half-dollar coin. 


Each box of a certain brand of breakfast cereal contains a small charm, with k 
distinct charms forming a set. Assuming that the chance of drawing any particu- 
lar charm is equa! to that of drawing any other charm, show that the probability 
of finding at least one complete set of charms in a random purchase of N > k 
boxes equals 


-062) «0C2) -OC2) 


ee cene( k ION (Hint: Use (1.3.7).] 
sz abus ; 3.7). 
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11. 
12. 


13. 


14. 


15. 


16. 


17. 


Prove Rules 1 through 4. 


In a five-card poker game, find the probability that a hand will have: 

(a) A royal flush (ace, king, queen, jack, and 10 of the same suit). 

(b) A straight flush (five cards in a sequence, all of the same suit; ace is high but 
A, 2, 3, 4, 5 is also a sequence), excluding a royal flush. 

(c) Four of a kind (four cards of the same face value). 


(d) A full house (three cards of the same face value x and two cards of the same 
face value y). 


(e) A flush (five cards of the same suit, excluding cards in a sequence). 
(f) A straight (five cards in a sequence). 


(g) Three of a kind (three cards of the same face value and two cards of different 
face values). 


(h) Two pairs. 
(i) A single pair. 


(a) A married couple and four of their friends enter a row of seats in a concert 
hall. What is the probability that the wife will sit next to her husband if all 
possible seating arrangements are equally likely? 


(b) In part (a), suppose that the six people go to a restaurant after the concert 
and sit at a round table. What is the probability that the wife will sit next to 
her husband? 


Consider a town with N people. А person sends two letters to two separate 
people, each of whom is asked to repeat the procedure. Thus for each letter re- 
ceived, two letters are sent out to separate persons chosen at random (irrespective 
of what happened in the past). What is the probability that in the first n stages 
the person who started the chain letter game will not receive a letter? 


Consider a town with N people. A person tells a rumor to a second person, who 
in turn repeats it to a third person, and so on. Suppose that at each stage the 
recipient of the rumor is chosen at random from the remaining N — 1 people. 
What is the probability that the rumor will be repeated n times: 


(a) Without being repeated to any person? 
(b) Without being repeated to the originator? 


There were four accidents in a town during a seven-day period. Would you be 
surprised if all four occurred on the same day? If each of the four occurred on a 
different day? 


Whereas Rules 1 and 2 of counting deal with ordered samples with or with- 

out replacement, Rule 3 concerns unordered sampling without replacement. The 

most difficult rule of counting deals with unordered with replacement sampling. 
Frl 

Show that there are ( d ) possible unordered samples of size r from a 


population of n elements when sampled with replacement. 
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1.5 CONDITIONAL PROBABILITY AND BAYES THEOREM 


So far, we have computed probabilities of events on the assumption that no infor- 
mation was available about the experiment other than the sample space. Sometimes, 
however, it is known that an event H has happened. How do we use this informa- 
tion in making a statement concerning the outcome of another event A? Consider the 
following examples. 


Example 1. Let urn 1 contain one white and two black balls, and urn 2, one black 
and two white balls. A fair coin is tossed. If a head turns up, a ball is drawn at random 
from urn 1; otherwise, from urn 2. Let E be the event that the ball drawn is black. 
The sample space is © = {Hby;, Hbj2, Нш, Tb21, Tw21, Tw22}, where Н denotes 
head, T denotes tail, b;; denotes jth black ball in ith urn, i = 1, 2, and so on. Then 


If, however, it is known that the coin showed a head, the ball could not have been 
drawn from urn 2. Thus, the probability of E, conditional on information H, is 2. 


Note that this probability equals the ratio P {head and ball drawn black]/ P {head}. 


Example 2. Let us toss two fair coins. Then the sample space of the experiment 
is $2 = (HH, HT, TH, TT}. Let event A = {both coins show same face) and B = {at 
least one coin shows H). Then PA — 2. If B is known to have happened, this 
information assures that TT cannot happen, and P{A conditional on the information 
that B has happened} = 4 = 1/2 = P(A П B)/PB. 


Definition 1. Let (О, S, P) be a probability space, and let H € S with PH > 0. 
For an arbitrary A € S we shall write 


P(AN Н) 


(1) P{A | H} = — H 


and call the quantity so defined the conditional probability of A, given H. Condi- 
tional probability remains undefined when PH = 0. 


Theorem 1. Let (9, S, P) be a probability space, and let H € S with PH > 0. 
Then (Q, S, Рн), where Py(A) = P{A | H} for all A € S, is a probability space. 


Proof. Clearly, Py(A) = P{A | H} > 0 for all A є S. Also, P(Q) = 
P(Q N H)/PH = 1.1f Aj, A2, ... is a disjoint sequence of sets in S, then 


= БРО NH) 22 id 
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Remark 1. What we have done is to consider a new sample space consisting of 
the basic set H and the o-field Sy = S N H, of subsets AN H, A є S, of H. On 
this space we have defined a set function Рн by multiplying the probability of each 
event by (PH)~!. Indeed, (Н, Sy, Рн) is a probability space. 

Let A and B be two events with PA > 0, PB > 0. Then it follows from (1) that 
(2) P(AN B)=PA-P{B|A}, and Р(АПВ) = РВ.Р(А | В). 


Equations (2) may be generalized to any number of events. Let A1, A2, ... , An € S, 
n > 2, and assume that PY» Aj) » 0. Since 


n—2 п—1 
A1 D (A101 A2) D (A10 А2Г\Аз) 2-2 (Ае) 2 (Aw). 
j=l j=l 


we see that 


n—2 
PA,>0, P(AiNA2)>0, ..., Мл) » v 
jal 


It follows that P (A, | apa Aj} are well-defined for k = 2, 3, ... „n. 
Theorem 2 (Multiplication Rule). Let (©, S, P) be a probability space and 


Ay, Ал,... An € S, with P(51A;) > 0. Then 


(3) (A ~) = P(A) P{A2 | A1}P{A3 | A1 N A2) P |^ 


j=1 





n-l 
N Aj | : 
j=l 
Proof. The proof is simple. 
Let us suppose that (H;] is a countable collection of events in S such that H; N 
Hy = Ø, j Æ k, and 35i Н; = ©. Suppose that РН; > 0 for all j. Then 
оо 
(4) РВ = УЎР(Н;)Р{В| Hj} — forall Be S. 
j=l 
For the proof we note that 
oo 
в =) (BN Hj), 
j=l 


and the result follows. Equation (4) is called the total probability rule. 
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Example 3. Consider a hand of five cards in a game of poker. If the cards are 
52 
dealt at random, there are ( 5 ) possible hands of five cards each. Let A = {at least 


3 cards of spades}, B = {all 5 cards of spades}. Then 


Р(А П В) = P{all 5 cards of spades} 
_G) 
52 
5 


Р(А П В) 
РА 


апа 


P{B|A}= 
13 52 
_ 5 5 
Зу 39). (3) 39), (13 52\` 
3 2 4 1 5 5 
Example 4. Urn 1 contains one white and two black marbles, urn 2 contains one 
black and two white marbles, and urn 3 contains three black and three white marbles. 
A die is rolled. If a 1, 2, or 3 shows up, urn 1 is selected; if a 4 shows up, urn 2 is 
selected; and if a 5 or 6 shows up, urn 3 is selected. A marble is then drawn at random 


from the urn selected. Let A be the event that the marble drawn is white. If U, V, W, 
respectively, denote the events that the urn selected is 1, 2, 3, then 


=(ANU)+(ANV)+(ANW), 


P(ANU)= P(U):- Р{А |0} = ё. 


3 
P(ANV) = P(V)- P{A| V) = 2-2 


P(ANW) = P(W)- P{A | №) = 2-2. 


It follows that 


PA=it}+i= 


ole 


ен 
orm 


A simple consequence of the total probability rule is the Bayes rule, which we 
now prove. 


Theorem 3 (Bayes Rule). Let {H,,} be a disjoint sequence of events such that 
PH, > 0, п = 1,2,..., and У, Н, = Q. Let B € S with PB > 0. Then 
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P(H;)P{B | Hj} 


(5) P{H; | B} = Eq 


D> PCH) PIB | Hi} 


i=l 
Proof. From (2) 
P(B^ Hj) = P(B)P{H; | B] = PH; P{B | Hj), 


and it follows that 
PH;P{B | Hjj 


Р{Н; | B) = PB 


The result now follows on using (4). 


Remark 2. Suppose that Hj, H2,... are all the “causes” that lead to the out- 
come of a random experiment. Let H; be the set of outcomes corresponding to the 
jth cause. Assume that the probabilities РН}, j = 1,2,... , called the prior prob- 
abilities, can be assigned. Now suppose that the experiment results in an event B of 
positive probability. This information leads to a reassessment of the prior probabili- 
ties. The conditional probabilities P(H; | B) are called the posterior probabilities. 
Formula (5) can be interpreted as a rule giving the probability that observed event B 
was due to cause or hypothesis Hj. 


Example 5. In Example 4, let us compute the conditional probability P(V | A}. 
We have 


gu ane PVP{A|V} 
PUP{A|U}+ PVP{A| V}+ PWP{A | W} 
1.2 1 
n 6 3 9 1 
Сүн # 


PROBLEMS 1.5 


1. Let A and B be two events such that PA = pı > 0, РВ = p > 0, and 
рі + p2 > 1. Show that P{B | A} > 1 — [(! — p2)/pil. 


2. Two digits are chosen at random without replacement from the set of integers 
{1, 2, 3, 4, 5, 6, 7, 8}. 
(a) Find the probability that both digits are greater than 5. 
(b) Show that the probability that the sum of the'digits will be equal to 5 is the 
same as the probability that their sum will exceed 13. 


3. The probability of a family chosen at random having exactly k children is ap*, 
0 < р < 1. Suppose that the probability that any child has blue eyes is b, 
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0 « b « 1, independently of others. What is the probability that a family chosen 
at random has exactly г (г > 0) children with blue eyes? 


. In Problem 3, let us write 


pk = probability of a randomly chosen family having exactly k children 


= ор“, E 





=]- : 
ро Ix» 
Suppose that all gender distributions of k children are equally likely. Find the 
probability that a family has exactly r boys, r > 1. Find the conditional proba- 
bility that a family has at least two boys, given that it has at least one boy. 


. Each of (№ + 1) identical urns marked 0, 1, 2, ... , № contains № balls. The 


kth urn contains k black and N — k white balls, k = 0,1,2,..., N. Anum 
is chosen at random, and л random drawings are made from it, the ball drawn 
always being replaced. If all the n draws result in black balls, find the probability 
that the (n 4- 1)th draw will also produce a black ball. How does this probability 
behave as N — oo? 


Each of п urns contains four white and six black balls, while another urn contains 
five white and five black balls. An urn is chosen at random from the (n + 1) urns, 
and two balls are drawn from it, both being black. The probability that five white 
and three black balls remain in the chosen urn is 1. Find n. 


. In answering a question on a multiple-choice test, a candidate either knows the 


answer with probability p (0 < p « 1) ог does not know the answer with 
probability 1 — p. If he knows the answer, he puts down the correct answer with 
probability 0.99, whereas if he guesses, the probability of his putting down the 
correct result is 1/k (k choices to the answer). Find the conditional probability 
that the candidate knew the answer to a question, given that he has made the 
correct answer. Show that this probability tends to 1 as k — оо. 


An urn contains five white and four black balls. Four balls are transferred to a 
second urn. A ball is then drawn from this urn, and it happens to be black. Find 
the probability of drawing a white ball from among the remaining three. 


. Prove Theorem 2. 


An urn contains г red and g green marbles. A marble is drawn at random and its 
color noted. Then the marble drawn, together with c > 0 marbles of the same 
color, are returned to the urn. Suppose that n such draws are made from the urn. 
Find the probability of selecting a red marble at any draw. 


Consider a bicyclist who leaves a point P (see Fig. 1), choosing one of the roads 
P R,, P R3, P R3 at random. At each subsequent crossroad she again chooses ә 
road at random. 
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13. 
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Азд 


R32 Fg 


Fig. 1. Map for Problem 11. 


(a) What is the probability that she will arrive at point A? 
(b) What is the conditional probability that she will arrive at A via road P R4? 


Five percent of patients suffering from a certain disease are selected to undergo a 
new treatment that is believed to increase the recovery rate from 30 percent to 50 
percent. А person is randomly selected from these patients after the completion 
of the treatment and is found to have recovered. What is the probability that the 
patient received the new treatment? 


Four roads lead away from the county jail. A prisoner has escaped from the jail 
and selects a road at random. If road I is selected, the probability of escaping is 
i if road II is selected, the probability of success is 1; if road Ш is selected, the 
probability of escaping is 1; and if road IV is selected, the probability of success 
is B. 

(a) What is the probability that the prisoner will succeed in escaping? 

(b) If the prisoner succeeds, what is the probability that the prisoner escaped by 

using road IV? By using road I? 


A diagnostic test for a certain disease is 95 percent accurate, in that if a person 
has the disease, it will detect it with a probability of 0.95, and if a person does not 
have the disease, it will give a negative result with a probability of 0.95. Suppose 
that only 0.5 percent of the population has the disease in question. А person is 
chosen at random from this population. The test indicates that this person has 
the disease. What is the (conditional) probability that he or she does have the 
disease? 
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Let (©, S, P) be a probability space, and let A, В є S, with PB > 0. By the 
multiplication rule we have 


Р(А г B) = P(B)P(A | В}. 
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In many experiments the information provided by B does not affect the probability 
of event A; that is, P(A | В} = P{A}. 


Example 1. Let two fair coins be tossed, and let A = {head on the second throw}, 
B = {head on the first throw}. Then 


P(A) = P{HH,TH}=5, P(B) = (HH, HT} = 3, 


and 


Р(АПВ) _ 


P{A| В| = P 


= 1 = P(A). 


Ni | 


Thus 
Р(А П B) = P(A)P(B). 
In the following, we write AN B = AB. 


Definition 1. Two events, A and B, are said to be independent if and only if 
(1) P(AB) = Р(А)Р(В). 


Note that we have not placed any restriction on P(A) or P(B). Thus conditional 
probability is not defined when P(A) or P(B) = 0, but independence is. Clearly, 
if P(A) = 0, then A is independent of every E є S. Also, any event A є S is 
independent of @ and ©. 


Theorem 1. If A and B are independent events, then 
P(A| B} = P(A) if P(B) > 0 
and 
Р{В | A} = P(B) if P(A) > 0. 


Theorem 2. If A and B are independent, so are A and B^, A^ and B, and A^ 
and B°. 


Proof. 
P(A*B) = P(B — (A B)) 
= Р(В) ~ Р(А П B) ѕіпсе В 2 (АП В) 
= Р(В)[1 — Р(А)] 
= P(A‘) P(B). 


Similarly, one proves that A^ and B^, and A and В, are independent. 
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We wish to emphasize that independence of events is not to be confused with 
disjoint or mutually exclusive events. If two events, each with nonzero probability, 
are mutually exclusive, they are obviously dependent since the occurrence of one 
will automatically preclude the occurrence of the other. Similarly, if A and B are 
independent and PA > 0, PB > 0, then A and B cannot be mutually exclusive. 


Example 2. A card is chosen at random from a deck of 52 cards. Let A be the 
event that the card is an ace, and B, the event that it is a club. Then 


P(A)= $5 = тз. Р(В) = 33 = 4, 


апа 
P(AB) = Расе of clubs} = i 


ѕо that A and B are independent. 

Example 3. Consider families with two children, and assume that all four 
possible distributions of gender: BB, BG, GB, GG, where B stands for boy and 
G for girl, are equally likely. Let E be the event that a randomly chosen family has 
at most one girl, and F, the event that the family has children of both genders. Then 

P(E)=}, P(F)-l, and P(EF)= +, 
so that E and F are not independent. 

Now consider families with three children. Assuming that each of the eight pos- 
sible gender distributions is equally likely, we have 

P(E)=%, P(F)-$, and P(EF)=3, 
so that E and F are independent. 

An obvious extension of the concept of independence between two events A and 
B to a given collection И of events is to require that any two distinct events іп L be 


independent. 


Definition 2. Let 4 be a family of events from S. We say that the events 4 are 
pairwise independent if and only if for every pair of distinct events A, B € У, 


P(AB) = PAPB. 
A much stronger and more useful concept is mutual or complete independence. 
Definition 3. A family of events И is said to be a mutually or completely inde- 


pendent family if and only if for every finite subcollection (Ai,, Aj,,... , Aj, } of U, 
the following relation holds: 
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k 
(2) Р(А ПА, n n A) = [T PA- 
j=l 


In what follows we omit the adjective mutual or complete and speak of indepen- 
dent events. It is clear from Definition 3 that to check the independence of n events 


A1, A2,..., An € S, we must check the following 2" — n — 1 relations: 
P(AiAj) = PAiPA;, iAj;i,j=1,2,...,n, 
P(AjAj Ax) = PA; PA; P Ax, із ј Ak i,j,k =1,2,...,n, 


P(A1A2--- Ap) = РАЈ РА? ·-· PAn. 


The first of these requirements is pairwise independence. Independence therefore 
implies pairwise independence, but not conversely. 


Example 4 (Wong [119]). Take four identical marbles. On the first, write sym- 
bols A1 А2 Аз. On each of the other three, write A1, Az, Аз, respectively. Put the four 
marbles in an urn and draw one at random. Let Е; denote the event that the symbol 
Aj appears on the drawn marble. Then 


P(Ei) = P(E2) = P(E3) = 3, 
P(E, Ез) = P(E2E3) = P(E E3) = 1, 
and 
(3) P(E, E2E3) = 1. 


It follows that although events Ej, E2, E3, are not independent, they are pairwise 
independent. 


Example 5 (Кас [46], рр. 22—23). In this example P(E, E2E3) = P(E) х 
P(E) P(E3), but E1, E2, Ез are not pairwise independent and hence not indepen- 
dent. Let Q = (1, 2, 3, 4}, and let p; be the probability assigned to {i}, i = 1, 2, 3, 4. 
Let р = у2/2— 1, рэ = $, рз = 47 V 2/2, pa = 4. Let E = (1,3), E2 = (2, 3}, 
Ез = (3, 4}. Then 


3 2 1 2 2 
P(E, EE3) = P] = a7 у: xm ( – +) ( – +) 


= (pi + p3)(p2 + рз)(рз + pa) 
= P(E) P(E2)P(E3). 


But P(E; E2) = 1 — 42/2 # PE, РЕ», and it follows that Ej, E2, E3 are not 
independent. 
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Example 6. A die is roiled repeatedly until a 6 turns up. We will show that event 
A, that “a 6 will eventually show up,” is certain to occur. Let A, be the event that a 6 
will show up for the first time on the kth throw. Let A = У, Ax. Then 





ZEHAR pcr 
and 
panty (3) =; de 
б 6 61-3 


Alternatively, we can use the corollary to Theorem 1.3.6. Let B, be the event that 
a 6 does not show up on the first n trials. Clearly, В„ С Bn, and we have A^ = 
г , Bn. Thus 


oo 5 n 
1- РА = РА = ‚(г 2 = lim Р(В„) = lim (2) =0. 
n->00 noo \6 


n=l 


Example 7. A slip of paper is given to person A, who marks it with either a plus 
or minus sign; the probability of her writing a plus sign is 1. A passes the slip to 
B, who may either leave it alone or change the sign before passing it to C. Next, C 
passes the slip to D after perhaps changing the sign; finally, D passes it to a referee 
after perhaps changing the sign. The referee sees a plus sign on the slip. It is known 
that B, C, and D each change the sign with probability 2. We shall compute the 
probability that A originally wrote a plus. 

Let N be the event that A wrote a plus sign, and M, the event that she wrote a 
minus sign. Let E be the event that the referee saw a plus sign on the slip. We have 


P(N)P(E | N] 


PIN | E) = ganP(ETMEE PODPIETNY 


Now 


P(E | N} = P{the plus sign was either not changed or changed exactly twice} 
E zy x (c 
(M3 3 3 


P(E | M} = P{the minus sign was changed either once or three times} 


-000 


and 
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PROBLEMS 1.6 
1. A biased coin is tossed until a head appears for the first time. Let p be the 


probability of a head, 0 < р < 1. What is the probability that the number of 
tosses required is odd? Even? 


. Let A and B be two independent events defined on some probability space, and 


let PA = 1, PB = 3. Find (a) P(AUB), (b) P(A | AUB}, and (c)P(B | AUB}. 


- Let Ai, A2, Аз be three independent events. Show that Ат, A5, and А5 are 


independent. 


. A biased coin with probability p, 0 « p < 1, of success (heads) is tossed until 


for the first time, the same result occurs three times in succession (that is, three 
heads or three tails in succession). Find the probability that the game will end at 
the seventh throw. 


. А box contains 20 black and 30 green balls. One ball at a time is drawn at ran- 


dom, its color is noted, and the ball is then replaced in the box for the next draw. 

(a) Find the probability that the first green ball is drawn on the fourth draw. 

(b) Find the probability that the third and fourth green balls are drawn on the 
sixth and ninth draws, respectively. 

(c) Let N be the trial at which the fifth green ball is drawn. Find the probability 
that the fifth green ball is drawn on the nth draw. (Note that N take values 
5,6,7,....) 


‚ Ап urn contains four red and four black balls. A sample of two balls is drawn 


at random. If both balls drawn are of the same color, these balls are set aside 
and a new sample is drawn. If the two balls drawn are of different colors, they 
are returned to the urn and another sample is drawn. Assume that the draws are 
independent and that the same sampling plan is pursued at each stage until all 
balls are drawn. 


(a) Find the probability that at least п samples are drawn before two balls of the 
same color appear. 

(b) Find the probability that after the first two samples are drawn, four balls are 
left, two black and two red. 


. Let A, B, and C be three boxes with three, four, and five cells, respectively. 


There are three yellow balls numbered 1 to 3, four green balls numbered 1 to 4, 
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10. 


11. 


and five red balls numbered 1 to 5. The yellow balls are placed at random in box 
A, the green in B, and the red in C, with no cell receiving more than one ball. 
Find the probability that only one of the boxes will show no matches. 


. А pond contains red and golden fish. There are 3000 red and 7000 golden fish, 


of which 200 and 500, respectively, are tagged. Find the probability that a ran- 
dom sample of 100 red and 200 golden fish will show 15 and 20 tagged fish, 
respectively. 


. Let (Q, S, P) be a probability space. Let A, B, C € S with PB and PC > 0. 


12. 


13 


If B and C are independent, show that 
Р{А | B) = PÍAI Bh C)PC  P(A| ВП С}РС. 


Conversely, if this relation holds, P(A | BC) 5 P(A | B}, and PA > 0, then 
B and C are independent. (Strait [110]) 


Show that the converse of Theorem 2 also holds. Thus A and B are independent 
if, and only if, A and В“ are independent; and so on. 


A lot of five identical batteries is life tested. The probability assignment is 
assumed to be 


1 
pu = f —e ^ dy 
A 


for any event A C [0, оо), where л > 0 is a known constant. Thus the probabil- 
ity that a battery fails after time t is given by 


оо 4 
Р.о) = | е "ах, t>0. 
t 


If the times to failure of the batteries are independent, what is the probability 
that at least one battery will be operating after t9 hours? 


On Q = (a,b), —oo < a < b « oo, each subinterval is assigned a proba- 
bility proportional to the length of the interval. Find a necessary and sufficient 
condition for two events to be independent. 


A game of craps is played with a pair of fair dice as follows. A player rolls the 
dice. If a sum of 7 or 11 shows up, the player wins; if a sum of 2, 3, or 12 shows 
up, the player loses. Otherwise, the player continues to roll the pair of dice until 
the sum is either 7 or the first number rolled. In the former case the player loses, 
and in the latter the player wins. 


(a) Find the probability that the player wins on the nth roll. 
(b) Find the probability that the player wins the game. 


(c) What is the probability that the game ends on (1) the first roll, (ii) the second 
roll, and (iii) the third roll? 


CHAPTER2 


Random Variables and Their 
Probability Distributions 


21 INTRODUCTION 


In Chapter 1 we dealt essentially with random experiments that can be described by 
finite sample spaces. We studied the assignment and computation of probabilities of 
events. In practice, one observes a function defined on the space of outcomes. Thus, 
if a coin is tossed n times, one is not interested in knowing which of the 2” n-tuples 
in the sample space has occurred. Rather, one would like to know the number of 
heads in n tosses. In games of chance, one is interested in the net gain or loss of a 
certain player. Actually, in Chapter 1 we were concerned with such functions without 
defining the term random variable. Here we study the notion of a random variable 
and examine some of its properties. 

In Section 2.2 we define a random variable, and in Section 2.3 we study the notion 
of probability distribution of a random variable. Section 2.4 deals with some special 
types of random variables, and in Section 2.5 we consider functions of a random 
variable and their induced distributions. The fundamental difference between a ran- 
dom variable and a real-valued function of a real variable is the associated notion 
of a probability distribution. Nevertheless, our knowledge of advanced calculus or 
real analysis is the basic tool in the study of random variables and their probability 
distributions. 


22 RANDOM VARIABLES 


In Chapter 1 we studied properties of a set function P defined on a sample space 
(©, S). Since P is a set function, it is not very easy to handle; we cannot perform 
arithmetic or algebraic operations on sets. Moreover, in practice one frequently ob- 
serves some function of elementary events. When a coin is tossed repeatedly, which 
replication resulted in heads is not of much interest. Rather, one is interested in the 
number of heads, and consequently, the number of tails, that appear in, say, n tossings 
of the coin. It is therefore desirable to introduce a point function on the sample space. 
We can then use our knowledge of calculus or real analysis to study properties of P. 


40 
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Definition 1. Let (О, S) be a sample space. A finite, single-valued function that 
maps Q into R is called a random variable (RV) if the inverse images under X of all 
Borel sets in R are events, that is, if 


(1) ХВ) = (0: X(») В eS гаі В BES. 


To verify whether a real-valued function оп (©, S) is an RV, it is not necessary to 
check that (1) holds for all Borel sets В € 98. It suffices to verify (1) for any class 91 
of subsets of R that generates B. By taking 2 to be the class of semiclosed intervals 
(—oo, x], x € R, we get the following result. 


Theorem 1. X is an RV if and only if for each x € R, 
(2) (o: X(@) x x) 2 iX <x} eS. 


Remark I. Note that the notion of probability does not enter into the definition 
of an RV. 


Remark2. If X is an RV, the sets {X = x}, {a < X zb, (X «x, laz X < 
b}, {а < X < b}, (a < X < b) are all events. Moreover, we could have used any 
of these intervals to define an RV. For example, we could have used the following 
equivalent definition: X is an RV if and only if 


(3) fw: X(w) « x) eS for all x € R. 
We have 

99 1 
(4) б <ю=[)(х<х-) 
апа 
(5) xs (к). 


Remark 3. In practice, (1) or (2) is a technical condition in the definition of an 
RV which the reader may ignore and think of RVs simply as real-valued functions 
defined on Q. It should be emphasized, though, that there do exist subsets of R that 
do not belong to B, and hence there exist real-valued functions defined on Q that are 
not RVs, but the reader will not encounter them in practical applications. 


Example 1. For any set A C О, define 


0 w¢A, 


fato) = c € A. 


Ia (о) is called the indicator function of set A. I4 is an RV if and only if A € S. 


42 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS 


Example 2. Let © = {H, Т}, and S be the class of all subsets of ©. Define X by 
X(H) = 1, X(T) = 0. Then 


@ if x < 0, 
X-l(—oo, x] = 4(T] if0<x « 1, 
(HT) ifl<x, 


and we see that X is an RV. 


Example 3. Let € = (HH, TT, HT, TH) and S be the class of all subsets of Q. 
Define X by 


X (о) = number of H's in о. 


Then X (HH) = 2, X (HT) = X(TH) = 1, and X(TT) = 0. 


9, x « 6, 
TT 

X^!(—oo, x] 22 { }, 0 =х < 1, 
{TT, HT, TH}, 1<x <2, 
Q, 2< х. 


Thus Х is an RV. 
Remark 4. Let (©, S) be a discrete sample space; that is, let © be a countable 
set of points and S be the class of all subsets of €2. Then every numerical-valued 


function defined on (©, S) is an RV. 


Example 4. Let О = (0, 1] and S = BN (0, 1] be the o-field of Borel sets on 
[0, 1]. Define X on © by 


X(w) = о, о € [0, 1]. 
Clearly, X is an RV. Any Borel subset of $2 is an event. 
Remark 5. Let X be an RV defined on (©, S) and a, b be constants. Then aX +b 


is also an RV on (©, S). Moreover, X? is an RV and so also is 1/ X, provided that 
(X = 0} = @. For a general result, see Theorem 2.5.1. 


PROBLEMS 2.2 
1. Let X be the number of heads in three tosses of a coin. What is Q? What are the 


values that X assigns to points of Q? What are the events {X < 2.75}, {0.5 < 
X < 1.72}? 
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2. A die is tossed two times. Let X be the sum of face values on the two tosses and 
Y be the absolute value of the difference in face values. What is Q? What values 
do X and Y assign to points of Q? Check to see whether X and Y are random 
variables. 


3. Let X be an RV. Is |X] also an RV? If X is an RV that takes only nonnegative 
values, is /X also an RV? 


A die is rolled five times. Let X be the sum of face values. Write the events 
(X = 4}, [X = 6), {X = 30}, and {X > 29}. 
5. Let Q = [0, 1] and S be the Borel o-field of subsets of 2. Define X on © as 
follows: X(w) = wif 0 <o < 5 о lif] <w <1. Is X an RV? 
If so, what is the event {w: Х(о) є ( 1, np 


6. Let A be a class of subsets of R that generates B. Show that X is an RV on Q if 
and only if X ^! (A) € R for all A є А. 


> 
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In Section 2.2 we introduced the concept of an RV and noted that the concept of 
probability on the sample space was not used in this definition. In practice, however, 
random variables are of interest only when they are defined on a probability space. 
Let (Q, S, P) be a probability space, and let X be an RV defined on it. 


Theorem 1. The RV X defined on the probability space (©, S, P) induces а 
probability space (R, B, О) by means of the correspondence 


(1) Q(B) = P(X (B) = Plo: X(o) e B} forall B € B. 
We write О = P X^! and call О or P X^! the (probability) distribution of X. 
Proof Clearly, Q(B) > 0 for all B € %, and also Q(R) = P{X є R} = 


P(Q) = 1. Let B; € B, i = 1,2,..., with B; ПВ; = Ø, i A j. Since the inverse 
image of a disjoint union of Borel sets is the disjoint union of their inverse images, 


еба) (8) 
-r [$m] 


PXT (в) = Y 009. 


i-l 


v 


1 


і 


It follows that (R, B, О) is a probability space, and the proof is complete. 
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We note that О is a set function and that set functions are not easy to handle. It is 
therefore more practical to use (2.2.2) since then Q(—oo, x] is a point function. Let 
us first introduce and study some properties of a special point function on R. 


Definition 1. A real-valued function F defined on (—oo, оо) that is nondecreas- 
ing, right continuous, and satisfies 


Е(—со)=0 and F(+œ)=1 
is called a distribution function (DF). 
Remark 1. Recall that if F is a nondecreasing function on R, then F(x—) = 
бт, F(t), F(x+) = lim,,, F(t) exist and are finite. Also, F(+00) and F(—oo) 
exist as і, +оо F(t) and lim; | Loo F(t), respectively. In general, 


F(x—) s F(x) x Е(х+), 


and x is a jump point of F if and only if F(x--) and F(x—) exist but are unequal. 
Thus a nondecreasing function F has only jump discontinuities. If we define 


F*(x) = F(x+) for all x, 


we see that F* is nondecreasing and right continuous оп 7. Thus in Definition 1 
the nondecreasing part is very important. Some authors demand left instead of right 
continuity in the definition of a DF. 


Theorem 2. The set of discontinuity points of a DF F is at most countable. 
Proof. Let (a, b] be a finite interval with at least n discontinuity points: 
a«xi«x?«-- «x, € b. 
Then 
F(a) < FG1—) < Е) <--- € FQ@n—) < FG@n) < F(b). 


Let py = F(xy) — Ё(хк—),К = 1,2,... ‚п. Clearly, 


$n < Fb) – F(a), 


k=} 


and it follows that the number of points x in (a, b] with jump p(x) > € > Ois 
at most e~!{F(b) — F(a)}. Thus for every integer N, the number of discontinuity 
points with jump greater than 1/N is finite. It follows that there are no more than a 
countable number of discontinuity points in every finite interval (a, b]. Since R is a 
countable union of such intervals, the proof is complete. 
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Definition 2. Let X be an RV defined on (Q, S, P). Define a point function F(-) 
on 7 by using (1), namely, 


(2) F(x) = Q(—oo, x] = P{w: Х(о) < х} for all x є R. 
The function F is called the distribution function of RV X. 

If there is no confusion, we will write 

F(x) = Р(Х < x). 

The following result justifies our calling F as defined by (2) a DF. 

Theorem 3. The function F defined in (2) is indeed a DF. 

Proof. Let x, < x2. Then (—оо, x1] С (—oo, х2], and we have 

F(x1) = PIX < x1} < PIX < х2) = FG). 

Since F is-nondecreasing, it is sufficient to show that for any sequence of numbers 


Xn } X,X4 > X2 > +++ Xà > +++ > xX, Е(хһ) > F(x). Let Ак = (o: Х(о) € 
(x, xx]. Then A, € S and A, f. Also, 


oo 
li Ак = А; = 
шо Г\ =й, 
since none of the intervals (x, xx] contains x. It follows that т.о Р(Ар) = 0. 


But 


P(Ax) = P{X < xx) — P{X < x} 
= F(xy) — F(x), 


so that 


lim F(x,) = F(x), 
k-o 


and F is right continuous. 
Finally, let {x,} be a sequence of numbers decreasing to —oo. Then 


(X < xn} 2 (X € Xn41} for each n 


and 


оо 
lim (X X Xn} ds < ха} =. 
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Therefore, 


F(—00) = lim P(X < xn} = P | tim {X < s] =0. 


Similarly, 


F(+00) = lim PIX < xa} = 1, 


and the proof is complete. 


The next result, stated without proof, establishes a correspondence between the 
induced probability О on (R, $8) and a point function F defined on R. 


Theorem 4. Given a probability О on (R, $8), there exists a distribution function 
F satisfying 


(3) QO(—00, x] = F(x) for all x € R, 


and conversely, given a DF F, there exists a unique probability Q defined on (R, B) 
that satisfies (3). 


For proof, see Chung [14, pp. 23-24]. 
Theorem 5. Every DF is the DF of an RV on some probability space. 


Proof. Let F be a DF. From Theorem 4 it follows that there exists a unique 
probability О defined on R that satisfies 


Q(—oo, x] = F(x) for all x є R. 
Let (R, *5, О) be the probability space on which we define 


Х (о) = о, о € R. 


О{о: X(w) < x) = Q(-oo. x] = F(X), 
and F is the DF of RV X. 


Remark 2. If X is an RV on (©, S, P), we have seen (Theorem 3) that F(x) = 
P{X x x} is а DF associated with X. Theorem 5 assures us that to every DF F 
we can associate some RV. Thus, given an RV, there exists a DF, and conversely. In 
this book when we speak of an RV we will assume that it is defined on a probability 
space. 
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Example 1. Let X be defined on (©, S, P) by 
X(w)=c for all о є ©. 
Then 


Р{Х = с} = 1, 
F(x) = Q(-oo,xj- P(X^!(-oo,x]) 20 іх <с 


and 
F(x) = 1 if x > c. 
Example 2. Let © — (H, T] and X be defined by 
X(H) = 1, X(T) = 0. 
If P assigns equal mass to {H} and {T}, then 


Р{Х = 0} = 1 = P(X = 1) 


апа 
0, x < 0, 
F(x) = Q(-oo,x] = 1, О<х<1, 
1, 1<х. 


Example 3. Let Q = ((, j): i, j € (1, 2,3,4, 5,6} and S be the set of all 
subsets of Q. Let P{(i, j)) = 1/6? for all 6? pairs (i, j) in Q. Define 


XG, j) itj, 1<i,j <6. 
Then 

0, x <2, 
x 2<x <3, 
$, 3<x <4, 

F(x) = Q(-oo,x] = PIX xx) = £, 4<x<5, 
2, 11 <x < 12, 
1, 12 < х. 


Example 4. We return to Example 2.2.4. For every subinterval 7 of [0, 1], let 
P (1) be the length of the interval. Then (©, S, P) is a probability space, and the DF 
of RV X(o) = о, w € Q, is given by F(x) = Oif x < 0, F(x) = Pío: Х(о) < 
x) = Р([0, х]) = x if x є [0, 1], and F(x) = lifx > I. 
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1. Write the DF of RV X defined in Problem 2.2.1, assuming that the coin is fair. 


2. What is the DF of RV Y defined in Problem 2.2.2, assuming that the die is not 
loaded? 
3. Do the following functions define DFs? 
(a) F(x) =Oifx <0,=xif0 <x < },and=1ifx >}. 
(b) F(x) = (1/7) tan“! x, —oo < x < оо. 
(c) F(x) = Oifx < 1, апі = 1 — (1/x) if < x. 
(d) F(x) =1—e™ if x >0,and=Oifx <0. 
4. Let X be an RV with DF F. 
(a) If F is the DF defined in Problem 3(a), find P{X > 4}, P(I < X < 3}. 
(b) If F is the DF defined in Problem 3(d), find P(—oo < X < 2}. 
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Let X be an RV defined on some fixed but otherwise arbitrary probability space 
(©, S, P), and let F be the DF of X. In this book we restrict ourselves mainly to two 
cases: the case in which the RV assumes at most a countable number of values and 
hence its DF is a step function, and that in which the DF F is (absolutely) continuous. 


Definition 1. An RV X defined on (©, 5, P) is said to be of the discrete type, or 
simply discrete, if there exists a countable set E C R such that P(X є Е) = 1. The 
points of E that have positive mass are called jump points or points of increase of 
the DF of X, and their probabilities are called jumps of the DF. 


Note that E € B since every one-point set is in B. Indeed, if x € R, then 
Ed 1 1 
(1) {х} = [Т («oie l) Я 
nzil n n 


Thus (X є E] is an event. Let X take on the value x; with probability p; (i = 
1,2,...). We have 


Pio: X(@) = xi} = pi, i=1,2,..., pi > Ога і. 
Then $72, pi = 1. 


Definition 2. The collection of numbers (pi) satisfying P(X = xi) = pi > 0, 
for all i and 372, pi = 1, is called the probability mass function (PMF) of RV X. 
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The DF F of X is given by 
(2) F(x) = PIX <x} = Y pi. 


Xi EX 
If 74 denotes the indicator function of the set A, we may write 
oo 
(3) X(o) = У xi lix=x (0). 
i=l 
Let us define a function £(x) as follows: 


1, x>0, 
€ = 
e) |, x «0. 


Then we have 
oo 
(4) F(x) = У pie(x — xi). 
i=l 
Example 1. The simplest example is that of an RV X degenerate at c, P{X = 
с} = 1: 


Fa) =r =| UR 


1, x >с. 


Example 2. ^ box contains good and defective items. If an item drawn is good, 
we assign the number 1 to the drawing; otherwise, the number 0. Let p be the prob- 
ability of drawing at random a good item. Then 


ates is и 
1 p. 


and 
0, x «0, 
F(x) = Р{Х <х}={1—р, O<x <1, 
1, 1 €x. 


Example 3. Let X be an RV with PMF 
6 1 
Р{Х = к} = Sog 


Then 


6 «X 1 
F(x) = вето. 
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Theorem 1. Let {р} be a collection of nonnegative real numbers such that 
У, рк = 1. Then {рк} is the PMF of some RV X. 


We next consider RVs associated with DFs that have no jump points. The DF of 
such an RV is continuous. We shall restrict our attention to a special subclass of such 
RVs. 


Definition 3. Let X be an RV defined on (9, S, P) with DF F. Then X is said to 
be of the continuous type (or simply, continuous) if F is absolutely continuous, that 


is, if there exists a nonnegative function f(x) such that for every real number x we 
have 


(5) ro-[ roa. 


The function f is called the probability density function (PDF) of the RV X. 


Note that f > 0 and satisfies lim, +оо F(x) = F(+00) = 1 f (tdt = 1. 
Let a and b be any two real numbers with a < b. Then 


Pla < X < b} = F(b) — F(a) 


b 
= [ f (0 dt. 


In view of remarks following Definition 2.2.1, the following result holds. 


Theorem 2. Let X be an RV of the continuous type with PDF f. Then for every 
Borel set B є B, 





©) pia) = | roa. 
B 
If F is absolutely continuous and f is continuous at x, we have 
dF (x) 
(7) Ех) = do ы f (x). 
х 


Theorem 3. Every nonnegative real function f that is integrable over R апа sat- 
isfies 


T /(х)ах = 1 


is the PDF of some continuous RV Х. 
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Proof. In view of Theorem 2.3.5, it suffices to show that there corresponds a DF 
F to f. Define 


ғо) = |" f (t) dt, x € R. 


Then F(—oo) = 0, F(+00) = 1, and if x2 > x1, 


Fey (f. «f Ü лой > f ' уйа = F(x). 
—00 XI —00 


Finally, F is (absolutely) continuous and hence continuous from the right. 


Remark 1. Їп the discrete case, P{X = a} is the probability that X takes the 
value a. In the continuous case, f(a) is not the probability that X takes the value a. 
Indeed, if X is of the continuous type, it assumes every value with probability 0. 


Theorem 4. Let X be any RV. Then 


(8) P(X =a} = lim P(t < X < a). 


t<a 
Proof. Lett, <t «--- «a, tn — a, and write 
An = {th «X xa]. 


Then A, is a nonincreasing sequence of events that converges to (5, An = (X = 
a}. It follows that іт, о PA, = P{X =a}. 


Remark 2. Since P{t < X x a} = F(a) — F (t), it follows that 


lim P(t < X <a} = P{X =a} = F(a) — lim F(t) 
1—aà ta 
t«a t«a 


= F(a) — F(a-). 


Thus F has a jump discontinuity at a if and only if P(X = a} > 0; that is, F is 
continuous at a if and only if P(X — a) — 0. If X is an RV of the continuous type, 
P{X =a} = 0 for all a є R. Moreover, 


P{X ER- {a} = t. 


This justifies Remark 1.3.4. 


Remark 3. The set of real numbers x for which a DF F increases is called the 
support of F. Let X be the RV with DF F, and let S be the support of F. Then 
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P(X є S) = land Р(Х є S°) = 0. The set of positive integers is the support of the 
DF in Example 3, and the open interval (0, 1) is the support of F in Example 4. 


Example 4. Let X be an RV with DF F given by (Fig. 1) 


0, x <0, 
F(x) = ix, O<x <i, 
1, 1<x. 


Differentiating F with respect to x at continuity points of f, we get 


fe re = 0 x «Oorx > 1, 


1, O<x <1. 


The function f is not continuous at x = 0 or at x = 1 (Fig. 2). We may define f (0) 
and f(1) in any manner. Choosing f (0) = f(1) = 0, we have 


1, O<x <1], 


fo 


0, otherwise. 


Then 


P(0.4 < X < 0.6) = F(0.6) — F(0.4) = 0.2. 


F(x) 





Fig. 1. 
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ay ee eee 


Fig. 2. 


Example 5. Let X have the triangular PDF (Fig. 3) 





x, О<х<1, 
Рох) = {2—х, 15х52, 
0, otherwise. 
1 
f(x) 
0 | 1 


Fig. 3. Graph of f. 
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F(x) 


F(x) 





x 


0 1 2 
Fig. 4. Graph of F. 


It is easy to check that f is a PDF. For the DF F of X we have (Fig. 4) 


0 ifx <0, 

x x? 
[а= if0<x<1, 
F(x) = 4"? 


1 x x2 
[+ | а-да 2х5 -1 ifl <x <2, 
0 1 2 
1 ifx > 2. 


Then 


Р{0.3 < X < 1.5} = P{X € 1.5} - Р(Х < 0.3} 
= 0.83. 


Example 6. Let k > 0 be a constant, and 


kx(i — x), О<х<1, 
0, otherwise. 


Ро) = | 
Then le f(x) dx = k/6. It follows that f (x) defines a PDF if К = 6. We have 


3 
Р{Х > 0.3} = 1 -6f x(1 — x) dx = 0.784. 
0 
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We conclude this discussion by emphasizing that the two types of RVs considered 
above form only a part of the class of all RVs. These two classes, however, contain 
practically all the random variables that arise in practice. We note without proof (see 
Chung [14, p. 9]) that every DF F can be decomposed into two parts according to 


(9) F(x) = aFa(x) + (1 — a)Fc(x). 
Here Fy and Е, are both DFs; F4 is the DF of a discrete RV, while Ес is a continuous 
(not necessarily absolutely continuous) DF. In fact, Fe can be further decomposed, 


but we will not go into that (see Chung [14, p. 11]). 


Example 7. Let X be an RV with DF 


0, x «0, 

1 

2 x = 0, 
F(x) = 

Li O<x <1 

2 2° ЕД 

1, 1 < х. 


Note that ће DF F has a jump at x = 0 and F is continuous (іп fact, absolutely 
continuous) in the interval (0, 1). F is the DF of an RV X that is neither discrete nor 
continuous. We can write 


F(x) = 4 Fa(x) + $ F(x), 


where 
0, x «0, 
Fa(x) = 
d(x) | xcd 
and 
0, x <0, 
Ех) = ix, 0 <х <1, 
1, | < х. 


Here F4(x) is the DF of the RV degenerate at x = 0, and Ес (х) is the DF with PDF 


1, О<х<1, 
fex) = 


0, otherwise. 
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PROBLEMS 2.4 


1. Let 
рк = р(1— р)“, k=0,1,2,..., O<p<1. 


Does {рк} define the PMF of some RV? What is the DF of this RV? If X is an 
RV with PMF {p;}, what is P{n x X < N}, where n, N (N > п) are positive 
integers? 


2. In Problem 2.3.3, find the PDF associated with the DFs of parts (b), (c), and (d). 


3. Does the function fa(x) = 62xe~™ if x > 0, and = O if x < 0, where @ > 0, 
define a PDF? Find the DF associated with fg (x); if X is an RV with PDF fo(x), 
find P{X > 1}. 


4. Does the function fa(x) = ((x + 1/1000 + D]Je^*/? if x > 0, and = 0 
otherwise, where 0 > 0 define a PDF? Find the corresponding DF. 


5. For what values of K do the following functions define the PMF of some RV? 
(a) f(x) = K(A*/x!),x =0,1,2,...,A>0. 
(b f(x) = K/N,x =1,2,...,N. 

6. Show that the function 


f@= se Ml, —00 < x < oo, 


is a PDF. Find its DF. 
7. For the PDF f(x) = x if0 < x < l,and—2-— x if 1 <x < 2, find 
P(l«Xzil. 
8. Which of the following functions are density functions? 
(a) f(x) 2x2 — x), 0 < x < 2, and 0 elsewhere. 
(b) f(x) =х(2х — 1), 0 < x < 2, and 0 elsewhere. 
(c) f(x) = (1/2) expl - (x —0)/A), x > 0, and 0 elsewhere, А > 0. 
(d) f(x) =sinx, 0 < x < 2/2, and 0 elsewhere. 
(е) f(x) = Oforx < 0, = (x + 1/9 forO < x < 1, = 2Qx — 1)/9 for 
1 <x < 3, = 2(5 — 2x)/9 for 3 <x < l, = 5; for2 < x < 5, and 0 
elsewhere. 
O f(x) =1/л(1+х?)], x ER. 
9. Are the following functions distribution functions? If so, find the corresponding 
density or probability functions. 
(a) F(x) = 0 for x < 0, = x/2for0 < x < 1, = 1 fori <x <2, = x/4 for 
2 < х <4and= 1 forx > 4. 
(b F(x) = Oifx < —0, = 1 (х/9 + 1) if |x| < 0, and 1 for x > 0 where 
0 > 0. 
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(c) F(x) = Oif x < 0, and = 1 — (1 + x) exp(—x) if x > 0. 
(d) F(x) = 0ifx < 1, = (x — 1?/8if 1 <x < 3, and 1 for x > 3. 
(e) F(x) = Oif x < 0, and = 1 ех if x > 0. 
10. Suppose that Р(Х > x) is given for a random variable X (of the continuous 


type) for all x. How will you find tbe corresponding density function? In partic- 

ular, find the density function in each of the following cases: 

(а) P(X > x) = lifx < 0, and Р(Х > x) = e™™ forx > 0; А > Oisa 
constant. 

(b) P(X > x) = lifx <0,and=(1+x/d)™, forx > 0, А > 01% a constant. 

(с) P(X > х) =1ifx < 0, and = 3/(1 + x? — 2/(1 + x)? if x > 0. 

(d) P(X > x) = lif x < xo, and = (xo/x)* if x > xo; xo > Oanda > Oare 
constants. 


2.5 FUNCTIONS OF A RANDOM VARIABLE 


Let X be an RV with a known distribution, and let g be a function defined on the real 
line. We seek the distribution of Y = g(X), provided that Y is also an RV. We first 
prove the following result. 


Theorem 1. Let X be an RV defined on (©, 5, P). Also, let g be a Borel- 
measurable function on R. Then g(X) is also an RV. 


Proof. For y € R, we have 


(g(X) € y} = (X € g^! (—oo. yl}, 


and since g is Borel-measurable, g ^! (—oo, y] is a Borel set. It follows that {g(X) < 
y) € $, and the proof is complete. 


Theorem 2. Given an RV X with a known DF, the distribution of the RV Y — 
g(X), where g is a Borel-measurable function, is determined. 


Proof. Indeed, for all y € R, 
(1) P{Y < y} = P{X eg ! (—oo, yl}. 


In what follows we always assume that the functions under consideration are 
Borel-measurable. 


Example 1. Let X be an RV with DF Е. Then |X|, aX + b (where a # O and b 
are constants), X k (where k > 0 is an integer), and |X|” (о > 0) are all RVs. Define 


Х+ = Х, X > 0, 
~ 10, X <0, 
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and 


Then X+, X- are also RVs. We have 


P{|X| < y} = P{-y < X < y} = P{X < y} — Р(Х < —yl 
= F(y) — F(—y) + P(X = —y}, у> 0; 
P{aX +b < у = Р{аХ < у – Ь} 
у= Б 








Р\Х < ifa > 0, 
=! a 
plx > 20 ifa < 0; 
a 
and 
0 if y < 0, 
P(X* < у} = 1 P{X <0} if y = 0, 
P(X <0}+ P(0 x X < y) if y > 0. 
Similarly, 
PIX- < у) = bar 
Р{Х x y] if y « 0. 


Let X be an RV of the discrete type and A be the countable set such that P(X € 
A} = Land Р(Х =x} > O for x € A. Let Y = g(X) bea one-to-one mapping from 
A onto some set B. Then the inverse map, g~!, is a single-valued function of y. To 
find P{Y = у}, we note that 


P(g(X) = y} = Р{Х =g7!(y)}, є B, 
py = у} = 1 (8 уф=Р{Х=8 (у у 
0, ye В“. 
Example 2. Let X be a Poisson RV with PMF 
gat 
PIX =} = е кї? k=0,1,2,...; А > 0, 
0, otherwise. 


Let Y = X? + 3. Then у = x? + 3 maps A = (0,1,2,...} onto В = (3, 4,7, 12, 
19, 28, ...}. The inverse map is x = ~y — 3, and since there are no negative values 
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in A, we take the positive square root of y — 3. We have 


e ^A» 


Р{Ү = у} = Р(Х —yy-3)— у Є B, 
(y — Му 2 3)! 


and P{Y = у} = 0 elsewhere. 
Actually, the restriction to a single-valued inverse on g is not necessary. If g has a 


finite (or even a countable) number of inverses for each y, from countable additivity 
of P we have 


P(Y = y) = P{g(X) = y) = А = а, g(a) = "| 
= Р(Х =a, g(a) = yl. 
Example 3. Let X be an RV with PMF 


P{X=-2=4, P{X=-l}=}, P{X=O0}=4, 
P{X=1}=%, and РХ = 2) = 1 


Let Y = Х?. Then 


—(—2,—1,0,1,2) and B= {0,1,4}. 


We have 
5 y=0, 
Р{У=у}= {1+ = >j 
Үй ке Жа 
5+3 = 30 У=% 


The case in which X is an RV of the continuous type is not as simple. First we note 
that, if X is a continuous RV and g is some Borel-measurable function, Y = g(X) 
may not be an RV of the continuous type. 


Example 4. Let X be an RV with uniform distribution on (—1, 1]; that is, the 
PDF of X is f(x) = $, —1 < x < l, and = 0 elsewhere. Let Y = ХТ. Then, from 
Example 1, 


0, у<0, 

1 

D y=0, 
P(Y <ур={{' | 

5+ 37, 12:39, 

1 у> 1. 


„ 
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We see that the DF of Y has a jump at у = O and that Y is neither discrete nor 
continuous. Note that all we require is that P(X < 0} > 0 for X* to be of the mixed 


type. 


Example 4 shows that we need some conditions on g to ensure that g(X) is also 
an RV of the continuous type whenever X is continuous. This is the case when g 
is a continuous monotonic function. A sufficient condition is given in the following 
theorem. 


Theorem 3. Let X be an RV of the continuous type with PDF f. Let y = g(x) 
be differentiable for all x and either g'(x) > O for all x or g'(x) < 0 for all x. Then 
Y = g(X) is also an RV of the continuous type with PDF given by 


d 
sonl Eso], а <у < В, 


0, otherwise, 


(2) һ(у) = 


where о = min(g(—oo), g(+00)} and В = max(g(—oo), g(+00)}. 

Proof. If g is differentiable for all x and g'(x) > 0 for all x, then g is continuous 
and strictly increasing, the limits о, В exist (may be infinite), and the inverse function 
x = g^ (y) exists, is strictly increasing, and is differentiable. The DF of Y for 
a < y < Bis given by 

P(Y < y} = PIX xg ! DI. 


The PDF of g is obtained on differentiation. We have 
d 
һ(у) = 3, AF syl 
y 
2 а _ 
= fla 0012-8 100). 
y 


Similarly, if g^ < 0, then g is strictly decreasing and we have 


PLY < у) = PIX > g 0) 
=1—P{X<g'(y)}  (Xisacontinuous RV) 


so that 
-1 d | 
h(y) = -fig 00]: 458 (y). 
y 


1 


Since е and g™ are both strictly decreasing, (d/dy) g !(y)is negative and (2) fol- 


lows. 
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Note that 
ZI ре 
dy? dg(x)/dx | x=g-1(y) | 
so that (2) may be rewritten as 
x) 
3) AQ) = L4 ‚о а<у<йё. 
\48(х)/4х\|к—,-\‹уу 





Remark 1. Тһе key to computation of the induced distribution of Y = #(Х) 
from the distribution of X is (1). If the conditions of Theorem 3 are satisfied, we 
are able to identify the set (X € g^!(—oo, y] as (X € g^! (y)) or (X > g^!(y)), 
according to whether g is increasing or decreasing. In practice, Theorem 3 is quite 
useful, but whenever the conditions are violated, one should return to (1) to compute 
the induced distribution. This is the case, for example, in Examples 7 and 8 and 
Theorem 4 below. 


Remark 2. Mf the PDF f of X vanishes outside an interval (а, b] of finite length, 


we need only to assume that g is differentiable in (a, b), and either g'(x) > 0 or 
g (x) « 0 throughout the interval. Then we take 


a = min(g(a), g(b)} and В = max{g(a), g(b)} 
in Theorem 3. 


Example 5. Let X have the density f(x) = 1,0 < x « 1, and = 0 otherwise. 
Let Y = e*. Then X = log Y, and we have 


1 
n= |51, 0 < logy <1, 


that is, 

1 

—, 1<ус<е, 

Һу) = {у 

0, otherwise. 

If у = —2 log x, then x = e^?/? and 
h) = |-5e | -1, 0 « e? <1, 
je, 0«y«oo, 


0, otherwise. 
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Example 6. Let X be a nonnegative RV of the continuous type with PDF f, and 
leta > 0. Let Y = X”. Then 


P(X < ye if y > 0, 
P(X* < у) = (IX <у'°} ifyz 
0 if y < 0. 
The PDF of Y is given by 
d 
ho) = fo |7- yt 
y 








1 
En Qu у> 0, 
0, у < 0. 


Example 7. Let X be ап КУ with PDF 
1 252502 

vV 2x 

Let Y = X?. In this case, g'(x) = 2x, which is > 0 for x > 0, and < 0 for x < 0, so 

that the conditions of Theorem 3 are not satisfied. But for y > 0, 


P(Y < y} = Pi-/y < X < Vy} 
= F(/y) — F(-./y), 
where F is the DF of X. Thus the PDF of Y is given by 





f(x) = 


—ОСо < Xx < OQ. 


1 





hy) = 124» 
0, y <0. 
Thus 
: evr О<у 
Һу) = 4 42x y 
0, y <0. 
Example 8. Let X be an RV with PDF 

2x 

= EVE О<х<л, 
0, otherwise. 


Let Y = sin X. In this case g'(x} = cosx > 0 for x in (0, 2/2) and < 0 for x in 
(7/2, л), so that the conditions of Theorem 3 are not satisfied. To compute the PDF 
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Fig. 1. у = sinx, Oc x <x. 


of Y, we return to (1) and see that (Fig. 1) the DF of Y is given by 


P{Y x y] = P{sinX < у}, О<у<1, 
= Р{0<Х<х)О(ф@›<Х <л)}, 


where x; = sin”! y and x2 = л — sin! y. Thus 


РҮ <= | fedes | fG) dx 


TORRO 





and the PDF of Y is given by 
d [sinh y ; d л —sin^! y : 
h(y) = — — | 1-— | ———— 
2 0 1 
ма, < у<1, 
= { nyl — y? 
0, otherwise. 


In Examples 7 and 8 the function y = g(x) can be written as the sum of two 
monotone functions. We applied Theorem 3 to each of these monotonic summands. 
These two examples are special cases of the following result. 
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Theorem 4. Let X be an RV of the continuous type with PDF f. Let y = g(x) 
be differentiable for all x, and assume that g’ (х) is continuous and nonzero at all but 
a finite number of values of x. Then for every real number y, 


(a) there exist a positive integer п = n(y) and real numbers (inverses) x; (y), 
X2(y), ... »Xn(y) such that 


&[хк(у)]= у and g’[x(y)] 40,  k=1,2,...,n(y), 


ог 
(b) there does not exist any x such that g(x) = y, g'(x) # 0, in which case we 
write n(y) = 


Then Y is a continuous RV with PDF given by 


Ух) ifn > 0, 
ho = {А 


0 if n = 0. 


Example 9. Let X be an RV with PDF f, and let Y = |X|. Here п(у) = 
xi(y) = у, x2(y) = —y for y > 0, and 


РО) + АСУ), y>O9, 
==] y x 0. 


Thus, if f (x) = $, —] < x < 1, and = 0 otherwise, then 


1, O<y<il, 
h(y) = 5^ 
o) |е otherwise. 
If f(x) = (1/4 22)e- 6/2, —o00 < x < oo, then 


2 
h(y) = 1 VIn 


0, otherwise. 


e 912, y>0, 


Example 10. Let X be an RV of the continuous type with PDF f, and let Y = 
X?™ , where m is a positive integer. In this case g(x) = х2", g'(x) = 2mx?"-! > 0 
for x > O and g'(x) < 0 for x < 0. Writing n = 2m, we see that for any у > 0, 
п(у) = 2, x1(y) = —у!/”, x2(y) = y". It follows that 


1 1 
hO) = fla): ny Ta * SON т-тун yiya 


1 1/п 1/ Я 
ич О y+ foy") ify>0, 
? if y « 0. 
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In particular, if f is the PDF given in Example 7, then 


2 2/n 
—==———ехр uu if y > 0, 
h(y) = 4 J2n nyl-Un 2 
0 





if y « 0. 


Remark 3. The basic formula (1) and the countable additivity of probability al- 
low us to compute the distribution of Y = g(X) in some instances even if g has a 
countable number of inverses. Let A C R and g тар A into В C R. Suppose that A 
can be represented as a countable union of disjoint sets Ag, k = 1,2,.... Then the 
DF of Y is given by 


P(Y < y) = PiX eg !(-oo, у]} 


=P {x € Y ttg" (oo, у]}п i] 


k=1 


P{X € Aur (g ' (—oo, yn]. 


Me 


k 


1 


If the conditions of Theorem 3 are satisfied by the restriction of g to each Ag, we 
may obtain the PDF of Y on differentiating the DF of Y. We remind the reader that 
term-by-term differentiation is permissible if the differentiated series is uniformly 
convergent. 


Example 11. Let X be an RV with PDF 


be, x>0, 
0, x<0, 


fo) | 6>0. 


Let Y = sin X, and let sin^! y be the principal value. Then (Fig. 2) for0 < y < 1, 


P(sin X < y) 
= P{0 < X «sin! y or Qn — 1x — sin”! у < Х < 2пл sin! y 
for all integers п > 1} 


oo 
= P{0< X< sin! Уу} + у P{(2n — 1)л — sin! y € X < 2пл + sin! yl 


п=1 


oo 
SENS g 95m У + У) (оет у] _ e8 Qnr +sin™' ») 


n=1 
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Fig. 2. y = sinx, x > 0. 


oo 
e e? sin”) y + (e +0 sin y er? sin“! >) A eum 
n=l 


e 26a 


als "Lo yk (елт +0 sin“! y. e? sin”! y) 
1 — е-207 


EIER | PNE 
eg 0748 sin У g-85in y 


= + 1 — e7279 


A similar computation can be made for у < 0. It follows that the PDF of Y is given 
by 


0e?" (1 e e 7)-1( um y!) (ee У + g 97-9 sin"! ») if -i< y< 0, 


h(y) = JOC — е-?#®у-1(1 — y?)- (esin y 4 e-0r+8sin™! у) if0<y<1, 
0 otherwise. 
PROBLEMS 2.5 


1, Let X be a random variable with probability mass function 
pix -n- (era py r=0,1,2,...,2, Oxp«xl. 
r 


Find the PMFs of the RVs (a) Y = aX + b, (b) Y = X?, and (c) Y = VX. 
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2. 


Let X be an RV with PDF 
0 if x <0, 
] 
ҒО) = 5 #0 <x <1, 
1 
X 


Find the PDF of the RV 1/ X. 


. Let X be a positive RV of the continuous type with PDF f (-). Find the PDF of 


the RV U = X/(1 + X). If, in particular, X has the PDF 


1, О<х<1, 


0, otherwise, 


а 


what is the PDF of U? 


. Let X be an RV with PDF f defined by Example 11. Let Y = cos X and Z = 


tan X. Find the DFs and PDFs of Y and Z. 
Let X be an RV with PDF 


ge ?* if x > 0, 


0 otherwise, 


fo(x) = | 


where 0 > 0. Let Y = (X — 1/0)?. Find the PDF of Y. 


. А point is chosen at random on the circumference of a circle of radius r with 


center at the origin, that is, the polar angle 0 of the point chosen has the PDF 


/ (Ө) = SN 0 € (—7,7). 
2л 


Find the PDF of the abscissa of the point selected. 


. For the RV X of Example 7, find the PDF of the following RVs: (a) Үү = е^, 


(b) Y; = 2X? + 1, and (c) Уз = g(X), where g(x) = lif x > 0, = 1 if x = 0, 
and — —1 if x « 0. 


Suppose that a projectile is fired at an angle 0 above the earth with a velocity V. 
Assuming that 0 is an RV with PDF 


12 if л ө л 
— — <0 < —, 

РӨ) = 4л 6 4 
0 otherwise, 


find the PDF of the range R of the projectile, where R = V? sin20/g, g being 
the gravitational constant. 
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9. Let X be an RV with PDF f(x) = 1/(2z) if 0 < x < 2л, and = 0 otherwise. 
Let Y = sin X. Find the DF and PDF of Y. 


10. Let X be an RV with PDF f(x) = 4 if —1 < x < 2, and = 0 otherwise. Let 
Y = |X|. Find the PDF of Y. 


11. Let X be an RV with PDF f(x) = 1/(20) if —0 < x < Ө, and = 0 otherwise. 
Let Y — 1/ X?. Find the PDF of Y. 

12. Let X be an RV of the continuous type, and let Y — g(X) be defined as follows: 
(а) g(x) = lifx > 0, and = —1ifx <0. 
(b) g(x) = bifx > Ё, =x if |x| < b, and = —b if x < —b. 
(c) g(x) = x if |x| > b, and = O if |x| < b. 
Find the distribution of Y in each case. 


CHAPTER 3 


Moments and Generating Functions 


3.1 INTRODUCTION 


The study of probability distributions of a random variable is essentially the study 
of some numerical characteristics associated with them. These parameters of the 
distribution play a key role in mathematical statistics. In Section 3.2 we introduce 
some of these parameters, namely, moments and order parameters, and investigate 
their properties. In Section 3.3 the idea of generating functions is introduced. In 
particular, we study probability generating functions, moment generating functions, 
and characteristic functions. In Section 3.4 we deal with some moment inequalities. 


3.2 MOMENTS OF A DISTRIBUTION FUNCTION 


In this section we investigate some numerical characteristics, called parameters, as- 
sociated with the distribution of an RV X. These parameters are moments and their 
functions and order parameters. We concentrate mainly on moments and their prop- 
erties. 

Let X be a random variable of the discrete type with probability mass function 
pk = P{X = xg}, k = 1,2,.... If 


oo 


(1) |хк|рк < oo, 
k=] 


we say that the expected value (or the mean or the mathematical expectation) of X 
exists and write 

со 
(2) и = EX — хр. 


k=1 


Note that the series 5 7^ , xy py may converge but the series Уу |хк|рк may 
not. In that case we say that E X does not exist. 
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Example 1. Let X have the PMF given by 


+137 2 ‚ 
pj Рх = cmt] = с, J= Rs 


oo 


Yay = fee, 


1) 


and ЕХ does not exist, although the series 


xs; = ots 
j=l j=l J 
is convergent. 


If X is of the continuous type and has PDF f, we say that EX exists and equals 
J xf (x) dx, provided that 


f res « oo. 


A similar definition is given for the mean of any Borel-measurable function A(X) 
of X. Thus if X is of the continuous type and has PDF f, we say that Eh(X) exists 
and equals f h(x) f (x) dx, provided that 


толгод ах < oo. 


We emphasize that the condition f |х| f(x) dx < oo must be checked before it 
can be concluded that E X exists and equals f xf (x) dx. Moreover, it is worthwhile 
to recall at this point that the integral T ф(х) dx exists, provided that the limit 
lim, 20 f^, v(x) dx exists. It is quite possible for the limit іта. оо f^, ф(х) dx 
to exist without the existence of с ф(х) dx. As an example, consider the Cauchy 
PDF: 


1 





—0 < X < OO. 


Clearly, 


^x 
lim = 2 
a25o0oj a7 lx 


dx = 0. 





However, E X does not exist since the integral (1/7) T |х|/ (1+ х2) dx diverges. 
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Remark I. Let Х (ш) = I4(w) for some А є S. Then ЕХ = P(A). 


Remark 2. If we write A(X) = |X|, we see that EX exists if and only if E|X| 
does. 


Remark 3. We say that an RV X is symmetric about a point o if 
P{X >a+x}= P{X <a-—x} for all x. 
In terms of DF F of X, this means that if 
F(a —x)-1-— F(a - x) + Р(Х =a+x} 


holds for all x є R, we say that the DF F (or the RV X) is symmetric with o as the 
center of symmetry. ЇЇ о = ©, then for every x, 


F(—x) = 1— F(x) + P{X =x}. 


In particular, if X is an RV of the continuous type, X is symmetric with center o if 
and only if the PDF f of X satisfies 


Ро — x) = f(a + x) for all x. 


Ша = 0, we will say simply that X is symmetric (or that F is symmetric). 

As an immediate consequence of this definition we see that if X is symmetric with 
a as the center of symmetry and E|X| < oo, then EX = о. A simple example of a 
symmetric distribution is the Cauchy PDF considered above (before Remark 1). We 
will encounter many such distributions later. 


Remark 4. ТЄ a and b are constants and X is an RV with EIX| < oo, then 
ElaX + b| < оо and E(aX + b) = aEX + b. In particular, E(X — u} = 0, а 
fact that should not come as a surprise. 


Remark 5. Sf X is bounded, that is, if P{|X| < M} =1,0 < M < оо, then EX 
exists. 


Remark 6. f (X > 0} = 1 and EX exists, then EX > 0. 


Theorem 1. Let X be an RV and g be a Borel-measurable function on R. Let 
Y = g(X). If X is of discrete type, then 


(3) EY = 9 (aj) P(X = xj) 
j=1 


in the sense that if either side of (3) exists, so does the other, and then the two are 
equal. If X is of continuous type with PDF f, then EY = f g(x) f (x) dx in the 
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sense that if either of the two integrals converges absolutely, so does the other, and 
the two are equal. 


Remark 7. Let X be a discrete RV. Then Theorem 1 says that 


ose pP = xj 2 Do we PLY = у} 
ј=1 k=1 


in the sense that if either of the two series converges absolutely, so does the other, 
and the two sums are equal. If X is of the continuous type with PDF f, let h(y) be 
the PDF of Y — g(X). Then, according to Theorem 1, 


J g00/G)dx = f yh(y) dy, 
provided that E|g(X)| < oo. 


Proof of Theorem 1. In the discrete case, suppose that P(X є A} = 1. If y = 
g(x) is a one-to-one mapping of А onto some set B, then 


P[(Y-y]-P(X—g lp) yes. 
We have | 


3 ee) PIX =x} = Y »yP(Y = у}. 


хєА yeB 


In the continuous case, suppose that g satisfies the conditions of Theorem 2.5.3. Then 


В а 
f g(x) f(x) dx = f yfls OG 8 Ody 


by changing the variable to y = g(x). Thus 


B 
[ «ree - [ yh(y) dy. 


The functions h(x) = x", where n is a positive integer, and h(x) = |x|*, where о 
is a positive real number, are of special importance. If E X" exists for some positive 
integer n, we call EX” the nth moment of (the distribution function of) X about 
the origin. If E|X|* < oo for some positive real number о, we call E|X|* the ath 
absolute moment of X. We shall use the notation 


(4) тһ = EX" and By = E|XJ* 


whenever the expectations exist. 
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Example 2. Let X have the uniform distribution on the first N natural numbers; 
that is, let 


1 


X=k}=—, k=1,2,...,N. 
P( } N 
Clearly, moments of all order exist: 
N 
1 N+i 
EX = k- — = —, 
eee 
N 
1 (N+ DON +1) 
2_ 2 Ж. 
ЕХ? =} k i s 
k=1 
Example 3. Let X be an RV with PDF 
2 
71^ > 1, 
fo-ixv *^* 
0, x«l 
Then 
oo 
2 
x 
But 


does not exist. Indeed, it is easily possible to construct examples of random variables 
for which all moments of a specified order exist but no higher-order moments do. 


Example 4. Two players, A and B, play a coin-tossing game. A gives B one 
dollar if a head turns up; otherwise, B pays A one dollar. If the probability that the 
coin shows a head is p, find the expected gain of A. 

Let X denote the gain of A. Then 

P{X = 1} = P{tails} = 1 — p, Р{Х =-l}=p, 


and 


+ . 1 
Зр ИРА фр > 0 пуре 
= 0 if and only if p = 5. 


Thus ЕХ = 0 if and only if the coin is fair. 
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Theorem 2. If the moment of order t exists for an RV X, moments of order 0 < 
5 < t exist. 


Proof. Let X be of the continuous type with PDF f. We have 


кх | ылда | лод ах 
1х5 <1 |х| >1 

< P(JXI < 1} + ЕХ < оо. 
A similar proof сап be given when Х is a discrete RV. 


Theorem 3. Let X be an RV on a probability space (©, S, P). Let E|X|* < oo 
for some k > 0. Then 


n* P(|X| > n] > 0 as п — oo. 


Proof. We provide the proof for the case in which X is of the continuous type 
with density f. We have 


oo» f'it*rends = lim [ |] f (x) dx. 
n-»oo Ix|<n 


It follows that 
lim |х|К/(х)ах > 0 — asn— оо. 
n> Jix|>n 
But 
|. еледа = п*Р(їх\ > п}, 
|х|>п 
completing the proof. 


Remark 8. Probabilities of the type P{|X| > n} or either of its components, 
P{X > п} or P(X < -n), are called tail probabilities. The result of Theorem 3, 
therefore, gives the rate at which P(|X| > n} converges to 0 as п — oo. 

Remark 9. Тһе converse of Theorem 3 does not hold in general; that is, 

n P(X| > п} — 0 as n — oo for some k 
does not necessarily imply that E|X|* < oo, for consider the RV 


Р{Х =п|= n2,3:..5 


n?logn' 
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where c is a constant determined from 


oo 


c 
2. m5; аз 


п=2 


We have 
d = -1 
P{X > п} ғ ef Оор & cn (logn) 


and n P(X > n) —^ 0 а$ п — oo. (Here and subsequently, ~ means that the ratio of 
two sides > 1 asn — oo.) But 


c 
EX = = oo 
25 = 





In fact, we need 
n'?P(X|-n]—0 asn>0 


for some ô > 0 to ensure that E|X|* < oo. A condition such as this is called a 
moment condition. 


For the proof we need the following lemma. 
Lemma 1. Let X be a nonnegative RV with distribution function F. Then 
oo 
(5) zx = | [1 — F@)] dx, 
0 
in the sense that if either side exists, so does the other and the two are equal. 
Proof. 1f X is of the continuous type with density f and EX < оо, then 
oo n 
ЕХ = f xf(x)dx = lim f xf (x) dx. 
0 n—oo 0 
On integration by parts, we obtain 
n n n 
[ xf (x) dx =nF(n) -f F(x) dx = —n[1 — F(n)] +f [1 — F(x)] dx. 
0 0 0 
But 
со oo 
n[1 — F(n)] =n f f(x)dx < f xf (x) dx, 
n n 


and since E|X| < оо, it follows that 


nil — F(n)]] ^ 0 as n — oo. 


76 MOMENTS AND GENERATING FUNCTIONS 


We have 


n 


EX = lim Í xf(x)dx — lim fu — F(x)]dx = [ [1 — F(x)] dx. 
n-—0oo 0 noo 0 0 
If fo^ [1 — F(x)]dx < oo, then 


[ тоа = [n одах f [1 — F(x)]dx, 
0 0 0 


and it follows that E|X| « oo. 
We leave the reader to complete the proof in the discrete case. 


Corollary 1. For any RV X, E|X| < оо if and only if the integrals режі Р{Х < 
x}dx and n. P(X » x] dx both converge, and in that case 


oo 0 
EX =f P{X > xjdx -f P{X < x}dx. 
0 


—00 


Actually, we can get a little more ош of Lemma 1 than the corollary above. In 
fact, 


oo оо 
E|X|* =f P(IXI^ > x) dx =a f xT! P{|X| > x) dx, 
0 0 


and we see that ап RV Х possesses an absolute moment of order a > 0 if and only if 
|x|*-! P{|X| > x} is integrable over (0, оо). 
A simple application of the integral test leads to the following moments lemma. 


Lemma 2 


(6) EXI" < oo & $^ P(IX| > п!/®} < оо. 


n-i 


Note that an immediate consequence of Lemma 2 is Theorem 3. We are now ready 
to prove the following result. 


Theorem 4. Let X be ап RV with a distribution satisfying n“ Р{| Х| > n} > 0 
as n > oo for some a > 0. Then E|X|? < oo for0 < В <a. 


Proof. Given £ > 0, we can choose ап N = N (e) such that 
P{|X|>n}<—  foralln>N. 
ne 


It follows that for 0 < В < a, 
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N oo 
E|X|? = pf xP- P(IX| > x}dx + pf xP! P(IX| > x) dx 
0 N 


oo 

< NP +ве | xê dy 
N 

< оо. 


Remark 10. Using Theorems 3 and 4, we demonstrate the existence of random 
variables for which moments of any order do not exist, that is, for which E|X|* = oo 
for every a > 0. For such an RV n* P(|X| > n) ^ 0 asn — оо for anya > 0. 
Consider, for example, the RV X with PDF 


1 





————Àá f 
fœ = l aog Pt 
0 otherwise. 
The DF of X is given by 
| ifx € —e 
210р |х| Slo: 
1 
Е(х) = 2 if—e<x<e, 
1 
— Ziog if x >e. 
X 


Then for x > e, 


P{|X| > x) = 1 — F@) + F(-x) 
= 1 
^ 2logx' 





and x* P([X| > х} — оо as х — оо for any е > 0. It follows that E|X|* = oo for 
every a > 0. In this example we see that P{|X| > cx}/P{|X] > x} — lasx — oo 
for every c > 0. A positive function L(-) defined on (0, oo) is said to be a function of 
slow variation if and only if L(cx)/ L(x) — 1 as x — oo for every c > 0. For such 
a function x" L(x) — oo for every o > 0 (see Feller (23, pp. 275—279]). It follows 
that if P{|X| > x) is slowly varying, E|X|* = oo for every a > 0. Functions of 
slow variation play an important role in the theory of probability. 


Random variables for which P(|X| > x} is slowly varying are clearly excluded 
from the domain of the following result. 


Theorem 5. Let X be an RV satisfying 


Р{|Х 
(7) te c0 asx > oo forallc > 1; 
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then X possesses moments of all orders. [Note that if c = 1, the limit in (7) is 1, 
whereas if c < 1, the limit will not go to 0 since P{|X| > cx} > P{|X] > x}.] 


Proof. Lete > 0 (we will choose ғ later), choose xo so large that 


P(IX| > cx} 
8 — for all x > xo, 
(8) PIXI xj < є or all x > хо 


and choose x; so large that 
(9) P(IX| > x) <e for all x > х. 
Let N = max(xo, x). We have for a fixed positive integer ғ, 


PüX|» сх) тү Р(Х > Px} " 
Gg PüXi-x) 7 П PiX, > crx) ~° 


for x > N. Thus for x > N we have, in view of (9), 
1) PIX] > cx) x et. 


Next note that for any fixed positive integer п, 
оо 
(12) EIX” =n f Х"ТЇР{|Х| > х}ах 
0 
N со 
=n f x" P(IX| > xdr +n f x"! P(IX| > x) dx. 
0 N 


Since the first integral in (12) is finite, we need only show that the second integral is 
also finite. We have 


oo 
f spinis ico Y fs x"! P(IX| > x] dx 
N к= 


oo 
< Ye Nye" 2c N 


r-i 
oo 
=2М" Pery 
r=1 


= 2N" 





< оо, 
1 — єс" 


provided that we choose = such that ec” < 1. It follows that Е| Х|" < oo for n = 
1,2,.... Actually, we have shown that (7) implies that E|X jê < oo for all ô > 0. 
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Theorem 6. If hi, h2,...,hn are Borel-measurable functions of an RV X 
and Eh;(X) exists fori = 1,2,...,n, then E [i h;(X)] exists and equals 
Det Ehi(X). 


Definition 1. Let k be a positive integer and c be a constant. If E(X — c)* exists, 
we call it the moment of order k about the point c. If we take c = EX = p, which 
exists since E|X| < oo, we call E(X — џи)“ the central moment of order k or the 
moment of order k about the mean. We shall write 


uk = Е(Х – Шш). 


If we know m,,m2,... , my, we can compute шт, шо, ... , Hk, and conversely. 
We have 


Е k k 
(13) eg = E(X — и) = mi — eme t 


em — + + (0и 


and 


k k 
(1) my — E(X— u- JY = uk (Dum: T (5 


Jus tek. 
The case k = 2 is of special importance. 

Definition 2. If E X? exists, we call E(X — и)? the variance of X, and we write 
c? = var(X) = E(X — џ)2. The quantity o is called the standard deviation (SD) 
of X. 

From Theorem 6 we see that 
(15) o? = m = EX? — (EXy.. 

Variance has some important properties. 

Theorem 7. Var(X) = 0 if and only if X is degenerate. 

Theorem 8. Var(X) < E(X — c)? for any c % EX. 

Proof. We have 

var(X) = E(X — ш)? = E(X — с)? + (c — uy. 
Note that 
var(aX + b) = a? var(X). 


Let EX? « oo. Then we define 
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X—EX X-u 
am 
(10) A/ var( X) с 


and see that EZ = 0 and var(Z) = 1. We call Z a standardized RV. 





Example 5. Let X be an RV with binomial PMF 
Р{Х =k} = (aeta = p)",  k=0,1,2,...,n; O«pc«l. 


Then 
сүп k k 
ЕХ = К (1— р)" 
э 


п—1 _ 2r 
--Y(- |)“ "1 — py"* 
= пр; 


EX? = E[X(X - D + X] 
= kk- (i)a — p" np 


= n(n – 1)р? + np; 
var(X) = n(n — Dp? +пр— п? p* 
= пр(1 — р); 
ЕХ? = E[X(X —1)(Х —2)+3Х(Х — 1) + X] 
= п(п — 1)(п — 2)р? + 3n(n — Dp? c np; 


and 


из = тз — 3um + 2u? 
= n(n — 1)(п— 2)p? + 3n(n — Dp? + np — Зпр[п(п — Dp? T пр] + 2n? p? 
= пр(1 — р)(1 — 2р). 


In the example above we computed factorial moments EX(X — 1)(Х — 2)--- 
(X —k +1) for various values of k. For some discrete integer-valued RVs whose PMF 
contains factorials or binomial coefficients, it may be more convenient to compute 
factorial moments. 

We have seen that for some distributions, even the mean does not exist. We next 
consider some parameters, called order parameters, which always exist. 
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f(x) 
0 3p 1 
Fig. 1. Quantile of order p. 
Definition 3. A number x (Fig. 1) satisfying 
(17) P(X <x} > p, P{xX>x}>1-p, О<р<1, 


is called a quantile of order р [or (100p)th percentile] for the RV X (ог for the DF 
F of X). We write 3 p(X) for a quantile of order p for the RV X. 


If x is a quantile of order p for an RV X with DF F, then 
(18) p < F(x) < p+ P{X =x}. 


If P{X = x} = О, as is the case—in particular, if X is of the continuous type—a 
quantile of order p is a solution of the equation 


(19) F(x) = p. 

If F is strictly increasing, (19) has a unique solution. Otherwise (Fig. 2), there may 

be many (even uncountably many) solutions of (19), each of which is then called a 

quantile of order p. Quantiles are of great deal of interest in testing hypotheses. 
Definition 4. Let X be an RV with DF F. A number x satisfying 


(20) 2 < Ро) <}+Р{Х = х) 


or, equivalently, 
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(a) 





(b) о 1 x 


Fig. 2. (a) Unique quantile; (5) infinitely many solutions of F(x) — p. 


(21) Р{Х 


1A 


х}>% and P(X >x}> } 
is called a median of X (or F). 


Again we note that there may be many values that satisfy (20) or (21). Thus a 
median is not necessarily unique. 

If F is a symmetric DF, the center of symmetry is clearly the median of the DF F. 
The median is an important centering constant, especially in cases where the mean 
of the distribution does not exist. 
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Example 6. Let X be an RV with Cauchy PDF 


1 1 
л 1+х?”' 





—00 <x < OQ. 


Рб) = 


Then Е|Х | is not finite, but E|X|? < oo for 0 < à < 1. The median of the RV X is 
clearly x — 0. 


Example 7. Let X be an RV with PMF 


P{X = —2} = P{X =0}= 4}, P{X=1}=4, Р{Х=2}={. 


Р{Х <0}=4 and Р{Х>0}=ў> 1. 
In fact, if x is any number such that O < x < 1, then 

P(X < x} = Р{Х =-2}+ P(X =0} = 1 
and 

P(X > х} = Р{Х = 1} + P(X = 2) = 3, 


and it follows that every x, 0 < x < 1, is a median of the RV X. 
If p = 0.2, the quantile of order p is x = —2, since 


P{X<-2}=4>p and P{X>-2}=1>1-p. 


PROBLEMS 3.2 


1. Find the expected number of throws of a fair die until a 6 is obtained. 


2. From a box containing N identical tickets numbered 1 through N, n tickets are 
drawn with replacement. Let X be the largest number drawn. Find ЕХ. 


3. Let X be an RV with PDF 


Р(х) = 


que —oco«x«oo т> 1, 
x 


where c = Г(т)/[Г()Г(т — D. Show that E X?' exists if and only if 2r < 
2m — 1. What is EX? if 2r < 2m — 1? 


4. Let X be an RV with PDF 
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kak 


f(x) = 4 (x ray! 
0 otherwise (a > 0). 


if x > 0, 


Show that ЕХ |“ < oo fora < k. Find the quantile of order p for the RV X. 


5. Let X be an RV such that E|X| « oo. Show that E|X — с| is minimized if we 
choose c equal to the median of the distribution of X. 


6. Pareto's distribution with parameters a and В (both o and £ positive) is defined 
by the PDF 


Во . 
faay= pet Trem 
0 ifx <a. 


Show that the moment of order n exists if and only if n < B. Let B > 2. Find 
the mean and the variance of the distribution. 


7. For an RV X with PDF 
x fO<x <1, 


ifl<.x <2, 
(3 — x) if2<x <3, 


Хб) = 


Nop М М 


show that moments of all order exist. Find the mean and the variance of X. 


8. For the PMF of Example 5, show that 
EX’ = np + 7п(п — Op? + 6n(n — 1)(п — 2)? + n(n — 1)(п — 2)(n — 3) p* 
and 


иа = 3(npqY. + пра(1 — 6рд), 


where O < р < 1,9 = 1 — р. 
9, For the Poisson RV X with PMF 
x 


X 
P{X = х} -e?^—, x=0,1,2,..., 
x! 


show that EX = А, EX? =A+A?2, EX? =A 4-322 +2, ЕХ* A 732 + 
62? + A^, and u2 = из = А, иа = А + ЗА2. 


10. For any RV X with E|X|^ < oo, define 


desc. das Ed 
(2)? и2 


СЕ 


11. 


12. 


13. 
14. 


15. 


NERATING FUNCTIONS 85 


Here a3 is known as the coefficient of skewness and is sometimes used as а 
measure of asymmetry, and a4 is known as kurtosis and is used to measure the 
peakedness (“flatness of the top”) of a distribution. Compute «з and o4 for the 
РМЕ of Problems 8 and 9. 


For a positive RV X define the negative moment of order n by EX ", where 
n > Ois an integer. Find E[1/(X + 1)] for the PMFs of Example 5 and Prob- 
lem 9. 


Prove Theorem 6. 

Prove Theorem 7. 

In each of the following cases, compute ЕХ, var(X), and EX" (for n > 0, an 
integer) whenever they exist: 

(а) f(x) = 1, –1 <x< 1, and zero elsewhere. 

(b) f(x) = e^*, x > 0, and zero elsewhere. 

(с) f(x) = (К – D/x* ‚х > 1, and zero elsewhere; k > 1 is a constant. 

(d) f(x) = V[z(1-- x?)], —00 < x < oo. 

(e) f(x) = 6x(1 — x), 0 « x « 1, and zero elsewhere. 

(f) f(x) = xe^*, x > 0, and zero elsewhere. 

(р) Р(Х = х) = p(1— py, х = 1, 2,..., and zero elsewhere: 0 < р < 1. 
Find the quantile of order р(0 < р < 1) for ће following distributions. 

(а) f@)= 1/x?, x > 1, and zero elsewhere. 

(b) f(x) = 2х exp(—x?), x > 0, and zero otherwise. 

(c) f(x) = 1/0,0 < x x Ө, and zero elsewhere. 

(d) P(X =x) 20(1—0y ^, x = 1, 2,..., and zero otherwise; 0 < 0 < 1. 
(е) f(x) = (1/82)х exp(—x/B), x > 0, and zero otherwise; B > 0. 

(f) f(x) = (3/b?)(b — x)?,0 < x < b, and zero elsewhere. 


33 GENERATING FUNCTIONS 


In this section we consider some functions that generate probabilities or moments 
of an RV. The simplest type of generating function in probability theory is the one 


ass 


ociated with integer-valued RVs. Let X be an RV, and let 
px = P{X =k}, k=0,1,2,... 


а) 


Definition 1. The function defined by 


oo 
P(s) = >> pes‘, 
k=0 
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which surely converges for |s| < 1, is called the probability generating function 
(PGF) of X. 


Example 1. Consider the Poisson RV with PMF 


ak 
PIX =k} = е^, к= 0, 1,2,.... 


We һауе 


со e 
P(s) = 2:080 y = ее = eUS) forall s. 


Example 2. Let X be an RV with geometric distribution, that is, let 
P{X=k}=pq*,  k-0,12,...5 0<р<1, q—1- p. 


Then 


oo 

1 

P()-Y sp = p——, dizi 
0 1—54 


Remark 1. Since P(1) = 1, series (1) is uniformly and absolutely convergent in 
|s| € 1 and the PGF P is a continuous function of s. It determines the PGF uniquely, 
since P (s) can be represented in a unique manner as a power series. 


Remark 2. Since a power series with radius of convergence r can be differenti- 
ated termwise any number of times in (—r, r), it follows that 


ps) = у пт — D--(n—k-4 Р(Х = п)” *, 
n=k 


where P“ is the kth derivative of P. The series converges at least for —1 < s < 1. 
For s = 1 the right side reduces formally to E[X(X — 1)... (X — k + 1)], which 
is the kth factorial moment of X whenever it exists. In particular, if EX < oo, 
then P'(1) — EX, and if EX? < oo, then P"(1) = EX(X — 1) and var(X) — 
EX? — (ЕХ)? = P"(1) — [Р'(1)]? + P'(1. 


-A(1 


Example 3. 1n Example 1 we found that P(s) — e =s), |s| < 1, for a Poisson 


RV. Thus 


P'(s) = Хе А-9), 


P"(s) Re 075). 


Also, EX = А, E(X?— X) = А2, so that var(X) = EX? —(EXY) = 2242-22 = А. 
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In Example 2 we computed P(s) = p/(1 — sq), so that 





2 
Pq " 2pq 

P'(s) = ———À d P =-———. 

Olesya 

Thus 
2р? 2 
EX = 1, ЕХ? =14 PL and var(x) = 44-2. 
р р р p p p 
Example 4. Consider the PGF 
1 n 

ро = ( =), —oo«s«oo. 


Expanding the right side into a power series, we get 
n 1 n k n 
P(s) = У "x SE У ! pus*, 
Е сй” (o): к=0 a 


and it follows that 
a per 222 п п pee 
рх = b= = (2) о" к=0,1,...›п. 


We note that the PGF, being defined only for discrete integer-valued RVs, has limited 
utility. We next consider a generating function that is quite useful in probability and 
statistics. 


Definition 2. Let X be an RV defined on (Q2, S, P). The function 
Q) M(s) = Ее°Х 


is known as the moment generating function (MGF) of ће RV X if the expectation 
on the right side of (2) exists in some neighborhood of the origin. 


Example 5. Let X have the PMF 


6 1 
fo-peas Fah 
0, otherwise. 


Then (1/27) Ў у &**/ K?, is infinite for every s > 0. We see that the MGF of X 
does not exist. In fact, EX — oo. 


Example 6. Let X have the PDF 


le, x > 0, 


Јо) = 


0, otherwise. 
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Then 


1 oo 
M(s) = sf eG- V2 dx 





0 
1 x 1 
= Я 5 -— 
1—25 2 
Example 7. Let X have the PMF 
ak 
-А E 
Р{Х = к} = е ki’ k=0,1,2,..., 
0, otherwise. 


Then 
SX _ e? ex 
M(s) = Ee x 


—e70-7)  foralls. 
The following result will be quite useful subsequently. 


Theorem 1. The MGF uniquely determines a DF and, conversely, if the MGF 
exists, it is unique. 


For the proof we refer the reader to Widder [116, p. 460], or Curtiss [18]. Theo- 
rem 2 explains why we call M (s) an MGF. 


Theorem 2. If the МСЕ M (s) of an RV X exists for s in (—50, so), say, so > 0, 
the derivatives of all order exist at s — 0 and can be evaluated under the integral sign, 
that is, 


(3) M® (s)| ‚= EX* for positive integral k. 


For the proof of Theorem 2, we refer to Widder [116, pp. 446—447]. See also 
Problem 9. 


Remark 3. Alternatively, if the МСЕ M (s) exists for s in (—50, so), say, so > 0, 
one can express M (s) (uniquely) in a Maclaurin series expansion: 
M"'(0 M"(0 
(4) MG) = MO + —— ~ s+ Oe bus 
so that E X* is the coefficient of s*/ К! in expansion (4). 


Example 8. Let X be an RY with PDF f (x) = je7*/?, x > 0. From Example 6, 
M(s) = 1/(1 — 2s) fors < 5 ! Thus 


GENERATING FUNCTIONS 89 


2 4 У. 2 
М' ные NOMEN " = ——, 
(s) = (725) апа М (5) (— 253 S< 


NI = 


It follows that 
EX =2, ЕХ? =8, and va(X)= 4. 


Example 9. Let X bean RV with PDF f(x) = 1,0 < х < 1, апа = 0 otherwise. 
Then 


S 


| е* — 1 
M(s) = f ех dx = ; all s, 
0 5 





e-s — (е —– 1): 1 


M'(s) = ‘ 
(s) 2 


and 


sé—e&)--1 1 
EX = M'O) = lin ————— = -. 
(0) а 52 2 


We emphasize that the expectation Ее°Х does not exist unless s is carefully re- 
stricted. In fact, the requirement that M(s) exists in a neighborhood of zero is a 


very strong requirement that is not satisfied by some common distributions. We next 
consider a generating function that exists for all distributions. 


Definition 3. Let X be an RV. The complex-valued function ¢ defined on R by 
ф@) = E(e'*) = E(costX) ciE(sintX), | t€ 


where i = 4/—1 is the imaginary unit, is called the characteristic function (CF) of 
RV X. 
Clearly, 


o(t) = У (cos txz + isintxg)P(X = xy) 
k 


in the discrete case, and 


00 = | созгх/(х)ах +! f sintx f (x) dx 


—oo —oo 


in the continuous case. 
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Example 10. Let X be a normal RV with PDF 


Р(х) = : ex Es xER 
SUE Nay | 





Then 


P(t) : f. costx e* "2 dx + : f. sintx e "^q 
= x — mix e Xx 
М2л J—oo М2л J—oo 


n . . С " 2 
Note that sintx is an odd function and so also is sintx e~* /?, Thus the second 
integral on the right side vanishes and we have 


o(t) : T costx e "2 d 
= OS TX e x 
м 2T —со 


__2 T BPR 4, _„-2/2 
= —— costx e dx =e А tER. 

“Ул J- 

Remark 4. Unlike an MGF that may not exist for some distributions, a CF al- 
ways exists, which makes it a much more convenient tool. In fact, it is easy to see 
that @ is continuous on R, |ф(1)| < 1 for all t, and $(—1) = ф@) where ф is the 
complex conjugate of $. Thus ф is ће CF of — X. Moreover, ф uniquely determines 
the DF of RV X. For these and many other properties of characteristic functions, we 
need a comprehensive knowledge of complex variable theory, well beyond the scope 
of this book. We refer the reader to Lukacs [68]. 


Finally, we consider the problem of characterizing a distribution from its mo- 
ments. Given a set of constants { шо = 1, 41, и2,... }, the problem of moments asks 
if they can be moments of a distribution function F. At this point it will be worth- 
while to take note of some facts. 

First, we have seen that if the M(s) = Ее*Х^ exists for some X for s in some 
neighborhood of zero, then E|X|" « oo for all п > 1. Suppose, however, that 
E|X|" < oo for all n > 1. It does not follow that the MGF of X exists. 


Example 11. Let X be an RV with PDF 
f(x) = ce, O<a<1, —oo«x«oo, 


where c is a constant determined from 


oo or 
ef e V dx = 1. 
—oo0 
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Еч sx „—Хх® X x(s—x*-!) 
ее dx = е dx 
0 0 


and since a — 1 < 0, h 55* e^** dx is not finite for any s > 0. Hence the MGF 
does not exist. But 


Lets > 0. Then 


oo oo 
Е|Х|" = cf Ixl'e dx = 2 f x"e* dx < oo for each n, 


as is easily checked by substituting у = х“. 
Second, two (or more) RVs may have the same set of moments. 
Example 12. Let X have lognormal PDF 
f@= (eV 2m) le 0085) /2, x > 0, 
and f (x) = 0 for x < 0. Let X+, Je] < 1, have PDF 
fe(x) = РО) + ғ вїп(2л log х)], XER. 


[Note that fe > О for all е, |e] < 1, and Я Р(х) ах = 1, so fs is a PDF] Since, 
however, 


oo 1 oo 2 
k : _ —(t?/2)+kt os 
x" f (x) sin(2z log x) dx = —— f e sin(2xt)dt 
Í М2л J—oo 


= ыд" vn sin2z y) dy 
/ 2x —oo 
= 0, 


we see that 


oo оо 
f x* f(x)dx =f x* f(x) dx 
0 0 


for all e, |e| < 1, and k =0,1,2,.... But f(x) Æ fe(x). 

Third, moments of any RV X necessarily satisfy certain conditions. For example, 
if By = E|X|", we will see (Theorem 3.4.3) that (,)!/" is an increasing function 
of v. Similarly, the quadratic form 


Р 2 
Е (Sx) >0 
i=] 


yields a relation between moments of various orders of X. 
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The following result, which we do not prove here, gives a sufficient condition for 
unique determination of F from its moments. 


Theorem 3. Let {m;} be the moment sequence of an RV X. If the series 


(5) 


Me 
z| 
Km 


Ш 
ГА 


k 


converges absolutely for some s > 0, then {т} uniquely determines the DF F of X. 
Example 13. Suppose that X has PDF 
f(x) = е7" forx 20 and =Oforx < 0. 


Then EX* = fos x*e—* dx = k!, and from Theorem 3, 
oo 

=F at mme 

for 0 < s < 1, so that {т} determine F uniquely. In fact, from Remark 3, 


M(s) = Yomi = == 


0 < 5 < 1, which is the MGF of X. 








In particular, if for some constant c, 
misc, k=1,2,..., 


then 





Me 


оо k 
my <} ar <e e for s > 0, 


= 
ll 


1 


and the DF of X is determined uniquely. Thus if P{|X| < c) = 1 for some c > 0, 
then all moments of X exist, satisfying |т;| < ck, k > 1, and the DF of X is 
determined uniquely from its moments. 

Finally, we mention some sufficient conditions for a moment sequence to deter- 
mine a unique DF. 


(i) The range of the RV is finite. 
(ii) (Carleman) Уг (mox) 1/2“ = oo when the range of ће RV is (—oo, oo). 
If the range is (0, оо), a sufficient condition is "7° (тд) /?* = оо, 
(ii) lim, os [ (2л) 1/2" /2n] is finite. 
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PROBLEMS 3.3 


1. 


Find the PGF of the RVs with the following PMFs: 
(a) Р{Х =k} = (a — p)"*,k=0,1,2,...,0<p<l. 


(b) P(X = k} = [е ?^/0 —e79]0*/k, k = 1,2,...;А > 0, 
(с) P{X =k} = pq¥(1—q®t!)"!",k =0,1,2,...,N;0 < p < 1,q =1—p. 


Let X be ап integer-valued RV with PGF P (s). Let œ and В be nonnegative 
integers, and write Y = aX + b. Find the PGF of Y. 


Let X be an integer-valued RV with PGF P(s), and suppose that the MGF 
M (s) exists for s € (—50, 50), so > 0. How are M(s) and P (s) related? Using 
MO (s)/,..9 = ЕХ“ for positive integral k, find ЕХ“ in terms of the derivatives 
of P (s) for values of k = 1, 2, 3, 4. 


. For the Cauchy PDF 


1 1 
уж уса, —00 < x < оо, 
does the MGF exist? 
Let X be an RV with PMF 


Р{Х = j} = pj, j=0,1,2,.... 


Set РХ > j}= 4j, j =0,1,2,.... Clearly, qj = pja +Pj+2 +: j > 0. 
Write Q(s) = p» qjs/. Then the series for Q(s) converges in |s] < 1. Show 
that 


Q(s) = кс) for |s| < 1, 
1—5 


where P(s) is the PGF of X. Find the mean and ће variance of X (when they 
exist) in terms of Q and its derivatives. 


For the PMF 


. 4,0) 

PX = }} = =, 
РӨ) 

where ау > 0 and f(0) = 5752.9 a;0/, find the PGF and ће МСЕ in terms of 
f. 


j=0,1,2,..., 0>0, 


. For the Laplace PDF 


1 
(Waser m. -00 <х<00; А> 0, —œ< H< ©, 
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show that the MGF exists and equals 


1 
M(t) = (1 — 202) Fe, Itl < T 


. For any integer-valued RV X, show that 


оо 
Dos" PIX <n} = 0-3) ! PG), 


n=0 


where P is the PGF of X. 


. Let X be an RV with МСЕ M (t), which exists for t € (—/0, to), to > 0. Show 


that 
E|X|" « nts "[M(s) + M(—s)] 


for any fixed s, 0 < 5 < tọ, and for each integer n > 1. Expanding e" in a 
power series, show that for t € (—s, 5), 0 < s < tg, 





oo ЕХ" 
M(t) = t" . 
(t) >, т 


{Since a power series сап be differentiated term by term within the interval of 
convergence, it follows that for |t} < s, 
М®(у о = EX* 
for each integer k > 1.] (Roy, LePage, and Moore [93]] 
Let X be an integer-valued random variable with 


n 

! 1 = 

Eu cayo Pee «(C ҤЕ —0,1,2,...,n 
0 if k » n. 


Show that X must be degenerate at n. [Hint: Prove and use the fact that if E X^ < 
co for all k, then 





20, (5 — D 
Р(х) = Уу. g EXX- DX- EF DI. 
k=0 Р 


Write Р(х) as 


'co oo k 
P(s) = 3 P(X = Os* = УРО = Y 1) 
k=0 k=0 i=0 


SOME MOMENT INEQUALITIES 95 


= У-У ()r« =k). 
i=0 kai M 
11. Let p(n, k) = f(n, k)/n! where f(n, К) is given by 


fin+l,k)= f(n,k)+ f(n,k—-1)+---+fn,k-n) 


fork =0,1,....(5) and 


f(n,k) = 0 fork < 0, 1,0) = 1, f (L, k) = 0 otherwise. 


Let 
1 oo 
PG) = — у os fins) 
| k=0 
be the probability generating function of p(n, k). Show that 
n ES k 
PEO = (т) [|= fs <1. 
k-2 1—5 


(Р, is ће generating function of Kendall’s t-statistic.) 
12. Fork = 0, 1,..., (5). let и, (А) be defined recursively by 


Un(k) = un_1(k — n) + un-1 (k) 


with ио(0) = 1, uo(k) = 0 otherwise and и, (k) = 0 fork < 0. Let P,(s) = 
У 25.o s" us (k) be the generating function of {un}. Show that 


n 
P,(s)=[[U+s/) — forisi <1. 
j=l 


If pa(k) = un(k)/2”, find {pn (k)} forn = 2, 3, 4. (P, is the generating function 
of the one-sample Wilcoxon test statistic.) 


3.4 SOME MOMENT INEQUALITIES 
In this section we derive some inequalities for moments of an RV. The main result of 
this section is Theorem 1 (and its corollary), which gives a bound for tail probability 


in terms of some moment of the random variable. 


Theorem 1. Let A(X) be a nonnegative Borel-measurable function of an RV X. 
If Eh(X) exists, then for every є > 0, 
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а) P{h(X) > є} < TD, 





Proof. We prove the result when X is discrete. Let P{X = xy) = pk, k = 
1,2,.... Then 


Eh(X) = ў hou) p 
k 


= bs + Y) h(xk) pk, 
A Ас 


where 
A = (к: hu) > £}. 
Then 
Eh(X) > Y hi) px = © pr 
= ae > е}. | 


Corollary. Let A(X) = |X|" and = = К”, where r > Оапа К > 0. Then 





Е|Х|' 


(2) PüXIz К) < 7v, 


which is Markov’s inequality. In particular, if we take h(X) = (X — u^, e = K?o?, 
we get Chebychev-Bienayme inequality: 


1 
(3) Р{Х—д|> Ko) S туу, 


where ЕХ = p, var(X) = o?. 


Remark 1. The inequality (3) is generally attributed to Chebychev, although re- 
cent research has shown that credit should also go to I. J. Bienayme. 


Remark 2. If we wish to be consistent with our definition of a DF as Ех (х) = 
P(X < x), then we may want to reformulate (1) in the following form: 


Eh(X) 
19 


P{h(X) > £} < 





For RVs with finite second-order moments, one cannot do better than the inequal- 


ity in (3). 
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Example 1 
1 
PIX =0 —1- 45 
1 K » 1, constant, 
P(X —-r1] = — 
{ #1} aK? 
1 1 
iu 221. == 
EX =0, EX = 7 с=т, 
апа 
1 
P{|X| > Ко}=Р{|Х| > 1) — rz 


so that equality is achieved. 


Example 2. Let X be distributed with PDF f(x) = 1if0 < x < 1, and = 0 
otherwise. Then 


and 
1 / 1 1 1 1 1 
Х—-|\<2ү/-—{{=Р{-———<Х<-+——{}=1. 
‚|| z| < z) Б a «3*3 1 
From Chebychev's inequality 


1 [1 1 


In Fig. 1 we compare the upper bound for P{|X — li > k/4 12) with the exact 
probability. 





It is possible to improve upon Chebychev's inequality, at least in some cases, if 
we assume the existence of higher-order moments. We need the following lemma. 


Lemma 1. Let X be an RV with EX = 0 and var(X) = o?. Then 





2 
e ; 
(4) Р{Х >x}< olli if x > 0, 
and 
x? 
(5) P(X > x} > ——— if x < 0. 


o? + x? 
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Upper bound 


0 1 43 k 
Fig. 1. Chebychev upper bound versus exact probability. 


Proof Leth(t) = (t + c), c > 0. Then A(t) > 0 for all t and 


h(t) > (x +c)? fort >x > 0. 


It follows that 
(6) P{X > x) < P(h(X) > (х + с)2) 
2 
үү), їюгайс>0, x>0. 
(х + с)? | 


Since ЕХ = 0, ЕХ? = o%, and the right side of (6) is minimum when c = о?/х. 
We have 


2 


Р{Х > х} < “x >» 0. 





o? + x?’ 


A similar proof holds for (5). 
Remark 3. Inequalities (4) and (5) cannot be improved (Problem 3). 


Theorem 2. Let E|X|^ « 00, and let EX = 0, EX? = o?. Then 


4 
иа = а 
(7) PUXIZ Ko} S ака 56201 for К > 1, 


where u4 = ЕХ“. 
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Proof. For the proof, let us substitute (X? — o2)/(K?o? — о?) for X and take 
x = 1 in (4). Then 


X? — 0?) (К?20? – о?) 
2o? > Ko? o2) < МӘК 
PIX =o SRO NS TT OP -oN o 93] 
u4 — ot 
o4(K? — 1)? + u4 — o* 
4 
Ш с 
Sey К>1,‚ 
па t o4 K^ —2К?е^ 4 


as asserted. 


Remark 4. Bound (7) is better than bound (3) if K? > ил/о* and worse if 
1 < K? < u4/o* (Problem 5). 


Example 3. Let X have the uniform density 


1 f0<x « 1, 
ғо) = i otherwise. 
Then 
EX | ar(X) EI X Y J 
= >, V. = —, — _ = a 
2 g^ e 2) ~ 80 
апа 
dun 
ЛЕЕНЕ RES 
that is, 


1 [1 45 


which is much better than the bound given by Chebychev’s inequality (Example 2). 


Theorem 3 (Lyapunov Inequality). Let В, = E|X|" < оо. Then for arbitrary 
k,2 < Е <n, we have 


(8) В/О < gl. 
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Proof. Consider the quadratic form: 


со 
Qu, v) = f (uj 772 + vp (7072? f(x) dx, 
-00 


where we have assumed that X is continuous with PDF f. We have 


Q(u, v) = и? Въ + 2uvfy + Вело. 


Clearly, О > 0 for all и, v real. It follows that 








к Ben |29 
implying that 
Be = Ве Вент. 
Thus 
ВЕ < BOB), Ва < ВІВ, o Bu Bim 


where Во = 1. Multiplying successive k — 1 of these, we have 
С ам BL EB 
It follows that 
Br x B" <В s x Bal”. 
The equality holds if and only if 


1/k 1/(k+1 
B BU TTD fork =1,2,...; 


that is, (By! k) is a constant sequence of numbers, which happens if and only if |X| is 
degenerate; that is, for some c, P{|X| = с} = 1. 


PROBLEMS 3.4 


1. For the RV with PDF 


ex^ 


А 





ҒО; А) = 
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where à > 0 is an integer, show that 


a 
P(0« X «20.-1) > —— 
pex «20 Dp c o 


Let X be any RV, and suppose that ће MGF of X, M(t) = Ee", exists for every 
t > 0. Then for any t > 0, 


РХ > s?+logM(t)} < e^. 


3. Construct an example to show that inequalities (4) and (5) cannot be improved. 


. Let g(-) be a function satisfying g(x) > О for x > 0, g(x) increasing for x > 0, 


and E|g(X)| « oo. Show that 


Ев (Х|) 


P{|X| > ғ} < s) 





for every є > 0. 


Let X be an RV with ЕХ = 0, var(X) = o?, and ЕХ“ = p4. Let К be any 
positive real number. Show that 
1 if K? « 1, 
1 Ne 27 
Р{|\Х| > Kc) < } к? fl sk 
ил — а“ 


25 H4 
if K x 


ра +04K4 — 2K2o0* 
In other words, show that bound (7) is better than bound (3) if K 2 > p4 Jat and 


worse if 1 < K? < 4/04. Construct an example to show that the last inequalities 
cannot be improved. 


. Use Chebychev’s inequality to show that for any k > 1, e**! > k?. 


7. For any RV X, show that 


РХ > 0} < inf[g(t) : t > 0] <1, 


where g(t) = Ee'*, 0 < y(t) < оо. 


. Let X be an RV such that P(a < X < b) = 1 where —oo <a < b < oo. Show 


that var(X) < (b — a)? /4. 


CHAPTER 4 


Multiple Random Variables 


4.1 INTRODUCTION 


In many experiments an observation is expressible, not as a single numerical quan- 
tity, but as a family of several separate numerical quantities. For example, if a pair of 
distinguishable dice is tossed, the outcome is a pair (x, y), where x denotes the face 
value on the first die, and y, the face value on the second die. Similarly, to record 
the height and weight of every person in a certain community, we need a pair (x, y), 
where the components represent, respectively, the height and the weight of a partic- 
ular person. To be able to describe such experiments mathematically, we must study 
multidimensional random variables. 

In Section 4.2 we introduce the basic notations involved and study joint, marginal, 
and conditional distributions. In Section 4.3 we examine independent random vari- 
ables and investigate some consequences of independence. Section 4.4 deals with 
functions of several random variables and their induced distributions. In Section 4.5 
we consider moments, covariance, and correlation, and in Section 4.6 we study con- 
ditional expectation. The last section deals with ordered observations. 
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In this section we study multidimensional RVs. Let (©, S, P) be a fixed but other- 
wise arbitrary probability space. 


Definition 1. The collection X = (X1, X2,... , Xn) defined on (©, S, P) into 
Ry by 


X(o) = (Х | (@), X2(@),... ‚ Xn(@)), WE Q, 
is called an n-dimensional КУ if the inverse image of every n-dimensional interval 


I = (x1, X2, ... Xn): —00 < xi < dai, ai € R,i =1,2,...,n} 
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is also in S, that is, if 
X^!) = (v: Xi(o) X ai, ..., Xa.(o) < an} € foraj € R. 


Theorem 1. Let Xj, X2, ... , Xn be n RVs on (Q, S, P). Then X = (Х|, X2, 
..., Xn) is an n-dimensional RV on (©, S, P). 


Proof Let I = {(х1, x2,..., x3): ^00 < xi € aj, i= 1,2,... , n). Then 
(OG, X2, ..., Xn) € I) = (o: Xi(o) < ay, X2@) < a2, ... , Xn(@) < an} 


(o: Ху (о) x ax] € $, 


з 


k=1 


Ш 


as asserted. 

From now on we restrict attention to two-dimensional random variables. The dis- 
cussion for the n-dimensional (n > 2) case is similar except when indicated. The 
development follows closely the one-dimensional case. 

Definition 2. The function F(., -), defined by 
(1) F(x,y) = PX x x, Y x у}, all (x, y) € Ro, 
is known as the DF of the RV (X, Y). 


Following the discussion in Section 2.3, it is easily shown that 


(i) F(x, y) is nondecreasing and continuous from the right with respect to each 
coordinate, and 


(ii) lim F(x, у) = F(+00, +оо) = 1, 
x +оо 


у> +оо 


lim F(x, у) = F(x, —00) = 0 for all x, 
y—> —00 
lim F(x, у) = F(—oo, у) = 0 for all у. 
х- oo 
But (1) and (ii) are not sufficient conditions to make any function F(., -) a DF. 
Example 1. Let F be a function (Fig. 1) of two variables defined by 


0, x<Oorx+y<lory «O0, 
1, otherwise. 


Fe» | 


Then F satisfies both (i) and (ii) above. However, F is not a DF since 
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F(x,y) = 1 





Fig. 1. 


P{h<X<i4<¥Y<1}=Fd,1)+ Е(4,1)— Е(1, 4) – Е(1,1) 
=1+0-1-1=-170. 


Let xı < x2 and y; < уз. We have 


P{xi < X < x2, y1 < Y < у} 
= P{X < x2, Y < у) + P{X < x1, Y < у} 
— P{X < x1, Y < y] - PIX < x2, Y < yi) 
= F(x2, уз) + FQ, у) — Ё(х\, уз) — Р(х, yi) 
>0 


for all pairs (xi, у), (x2, y2) with x < хо, у < уо, (see Fig. 2). 


Theorem 2. A function F of two variables is a DF of some two-dimensional RV 
if and only if it satisfies the following conditions: 


(i) F is nondecreasing and right continuous with respect to both arguments. 
(ii) F(—oo, y) = F(x, —oo) = 0 and F(+-00, +оо) = 1. 
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(X4,¥2) (X22) 





Gy) (X2.J1) 





0 x 
Fig. 2. (xi < x < x2, yy < y < у). 
(iii) For every (x1, y1), (x2, y2) with ху < x2 and yı < y the inequality 
(2) F (x2, y2) - F (x2, у) + Fai, y) — FQ, y2) > 0 
holds. 


The “if” part of the theorem has already been established. The “only if" part will 
not be proved here (see Tucker [113, p. 26). 
Theorem 2 can be generalized to the n-dimensional case in the following manner. 


Theorem 3. A function F (xj, x2, ... , Xn) is the joint DF of some n-dimensional 


RV if and only if F is nondecreasing and continuous from the right with respect to 
all the arguments x1, x2, ... , x, and satisfies the following conditions: 


(i) F(—00,X2,... , Xn) = F(xi, —00, Х3,..., Xn) 
= Е(х1,... , Xn—1, —00) = 0, 
Е(+оо, +оо, ... , +оо) = 1. 


(ii) For every (x1, x2, ... , x») € Rn and all e; > OG = 1, 2,... , п), the in- 
equality 


(3) F(x + 81, x2 + £2, ... , Xn + En) 


n 
-} FG FEL... s Xii + Eit, Xi, Xia T Eie sss Ха d En) 
izl 
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n 
+ bp F(xi +€1,--- , Xi-1 t Ei—1, Xis Xi41 T Eiti,- 
i j=} 
i<j 
хр t €j-p Xj, Xj+1 + jt, +. Xn + En) 
+e 


+(—1)" F (x1, x2, see »Xn) = 0 
holds. 


We restrict ourselves here to two-dimensional RVs of the discrete or continuous 
type, which we now define. 


Definition 3. A two-dimensional (or bivariate) RV (X, Y) is said to be of the 
discrete type if it takes on pairs of values belonging to a countable set of pairs A with 
probability 1. We call every pair (x;, y;) that is assumed with positive probability 
pij a jump point of the DF of (X, Y), and call pi; the jump at (xi, уу). Here A is ће 
support of the distribution of (X, У). 


Clearly, Li Dij = 1. As for the DF of (X, Y), we have 
F(x, y) = Ури, 
B 


where B = {(i, j): xi € x, yj € у}. 


Definition 4. Let (X, Y) be an RV of the discrete type that takes on pairs of values 
(xi, yj, = 1,2,... and j = 1,2,... . We call 


pij = P(X = x, Y = yj}, D-11254 E93 5 
the joint probability mass function (PMF) of (X, У). 


Example 2. А die is rolled, and a coin is tossed independently. Let X be the face 
value on the die, and let Y = 0 if a tail turns up and Y = 1 if a head turns up. Then 


А = ((1, 0), (2, 0),..., (6, 0), (1, 1), (2, D, ..., (6, D}, 


апа 


1 
Pij = ту fori-1,2,...,6; j=0,1. 


The DF of (X, Y) is given by 
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0, x«l,-oo«y«oo;—oo«x < оо, у < 0, 
D 1zx«2,0-yc«l, 
H, 2<x<3,0<y<l;l<x<2,1<y, 
1, 3<x<4,0<y<l, 
1, 4<x<5,0<y<1;2<x<3,1<y, 
FEDS f 5<х<6,0<у<1, 
1, 6<x,0<y<1;3<x<4,1<y, 
2, А<х<5,1<у, 
2, 5<х<6,1<у, 
1, 6<x,l <y. 
Theorem 4. A collection of nonnegative numbers (pij: і = 1,2,...;j = 


1,2,...} satisfying E= pij = 1 is the PMF of some RV. 
The proof of Theorem 4 is easy to construct with the help of Theorem 2. 


Definition 5. A two-dimensional RV (X, Y) is said to be of the continuous type 
if there exists a nonnegative function f(., -) such that for every pair (x, y) є R2 we 
have 


(4) Fane f I fa o) do] d. 


where F is the DF of (X, Y). The function f is called the (joint) PDF of (X, Y). 
Clearly, 


x y 
F (+00, +оо) = lim f J f(u, v) dvdu 
--0Q J —OO 


у-» +оо 


=f f f(u, v)dvdu = 1. 
—oo J —oo 


If f is continuous at (x, y), then 


82 F(x, у) 


(5) Tu = f(x,y). 


Example 3. Let (X, Y) be an RV with joint PDF (Fig. 3) given by 


e Gt», 0«x«oo, О0<у<оо, 


Рх, у) = | 


0, otherwise. 
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Fig. 3. f(x, у) = ехр[—(х + y)], x > 0, у > 0. 


Then 


(i-—e-*)(l—e7), О<х<со, 0 <у < оо, 
0, otherwise. 


re» | 


Theorem 5. If f is a nonnegative function satisfying f pon f т f(x, у) х dy = 
1, then f is the joint density function of some RV. 


Proof. For the proof, define 


кеу = | | : f vdv du 


—00 -—00 


and use Theorem 2. 
Let (X, Y) bea two-dimensional RV with PMF 
pij = P{X = xi, Y = yj]. 


Then 


(6) y» Pij = 


Me 


Р{Х = х;,Ү = yj} = Р{Ү = y;} 


li 
ы 
- 
ll 
= 


and 
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Me 
Me 


(7) Pij = P{X = xj, Y = yj} = P{X = xi}. 
j=l j=! 
Let us write 
oo оо 
(8) рі. = >. Pij and ру= > рї}. 
j=l i=l 


Then pj. > 0 and У, pi. = 1, pj > 0 and У р.у = 1, and {р}, {р.у} 
represent PMFs. 


Definition 6. The collection of numbers {p;.} is called the marginal PMF of X, 
and the collection {p.;}, the marginal PMF of Y. 


Example 4. A fair coin is tossed three times. Let X = number of heads in three 
tossings, and Y = difference, in absolute value, between number of heads and num- 
ber of tails. The joint PMF of (X, Y) is given in the following table: 








The marginal PMF of Y is shown in the column representing row totals, and the 
marginal PMF of X, in the row representing column totals. 


If (X, Y) is an RV of the continuous type with PDF f, then 


(9) Awe f (ene 
and 
(10) Hore f ЖУ 


satisfy Дх) > 0, fo(y) > 0, and Гре А(х) х = 1, [22 fa(y)dy = 1. It follows 
that fi (х) and fo(y) are PDFs. 


Definition 7. The functions fj (х) and f;(y), defined in (9) and (10), are called 
the marginal PDF of X and the marginal PDF of Y, respectively. 


Example 5. Let (X, Y) be jointly distributed with PDF f(x, y) = 2,0 «x < 
y « 1, and = 0 otherwise (Fig. 4). Then 
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f(x,y) = 2 





— 
0 1 x 


Fig. 4. f(x,y) =2,0<x<y<l. 


1 
— 2x, EI 
ra= f aay = ә x O<x<1 
ы 


0, otherwise 


and 


y 1 
fo [ зак = |07 О<у<1, 
0 


0, otherwise 
are the two marginal density functions. 


Definition 8. Let (X, Y) be an RV with DF F. Then the marginal DF of X is 
defined by 


(11) F\(x) = F(x, оо) = lim F(x, y) 
yoo 


Е у рі. if (X, Ү) is discrete, 
| fi@dt if (X, Y) is continuous. 


A similar definition is given for the marginal DF of Y. 


In general, given a DF F(x), x2, ... , Xn) of an n-dimensional RV (X1, X2,..., 
Xn), one can obtain any k-dimensional (1 < k < n — 1) marginal DF from it. Thus 
the marginal DF of (X;,, X5, ... Xi), where 1 < i} < i2 < «i, < n, is given 
by 
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lim F(x, X2,...,Xn) 
Xj 00 
ifiy,i2,... ,ik 


= F(+00,... , +оо, Xj,, -00, ... , FO0,... , Xi , HOO, ... , +оо). 


We now consider the concept of conditional distributions. Let (X, Y) be an RV 
of the discrete type with PMF p;i; = P(X = х;, У = yj}. The marginal PMFs 
are pj. = У and р.у = уре pij. Recall that if A, B € S and РВ > 0, the 
conditional probability of A, given B, is defined by 


P(AB) 


Р{А | и 


Take A = (X = xi] = (00, y): -oo < у < oo} and В = (У = yj} = 
{(x, yj); —oo < x < оо}, and assume that PB = P(Y = yj] = pj > 0. Then 
ANB = (X = xi, Y = yj], and 

Pij 

P{A | B} = P(X = x; |Y=yj}=—. 

Р.) 
For fixed j, the function P(X = x; | Y = yj} > Oand 32, P(X = x; | Y = 
yj} = 1. Thus P(X = x; | Y = yj}, for fixed j, defines a PMF. 


Definition 9. Let (X, Y) be ап RV of the discrete type. If P{Y = yj] > 0, the 
function 
P(X = xi, Y= yj} 


(12) РЕШЕ у 
7 4J 


for fixed j is known as the conditional PMF of X, given Y — y;. A similar definition 
is given for P(Y = yj | X = x;}, the conditional PMF of Y, given X = xj, provided 
that P{X = x;} > 0. 


Example 6. For the joint PMF of Example 4, we have for Y = 1, 


P(X =і{Ү= 1} = H PEW: 
2› i=1,2. 
Similarly, 
1 4 dre 
Вере 65 
0, ifi =1,2, 
РУ =ј1Х=01=0 0726 
1, if j = 3, 


апа ѕо оп. 
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Next suppose that (X, Y) is an RV of the continuous type with joint PDF f. Since 
P{X = x} = 0, P(Y = у} = 0 for any x, y, the probability P[X < х | Y = у}, 
or P(Y < y | X = х), is not defined. Let є > 0, and suppose that P(y — = < Y < 
y +£} > 0. For every x and every interval (y — е, y + £], consider the conditional 
probability of the event (X < x}, given that Y € (y — є, y + =]. We have 


Р(Х <х,у—є<Ү<у-+є} 


Р{Х <х|у-в<Ү<у+в}=< PH COSE 


For any fixed interval (y — e, y + €], the expression above defines the conditional DF 
of X given that Y € (y — £, у + £}, provided that P(Y € (y е, y + £]} > 0. We 
shall be interested in the case where the limit 


lim PX zx|Y e(y-&y-&l 
£—04- 


exists. 


Definition 10. The conditional DF of an RV X, given Y — y, is defined as the 
limit 


(13) lim P{X zx|Y e(y—-e yr el. 
670+ 


provided that the limit exists. If the limit exists, we denote it by Fxjy (x |у), and define 
the conditional density function of X, given Y = y, fx|y (x|y), as a nonnegative 
function satisfying 


(14) Руху (xly) = f feld: гайх eR. 


For fixed y we see that fxiy(x|y) > 0 and JS fxiv(xly) ах = 1. Thus 
fxiv (x1y) is a PDF for fixed y. 

Suppose that (X, Y) is an RV of the continuous type with PDF f. At every point 
(x, y) where f is continuous and the marginal PDF f2(y) > 0 and is continuous, we 
have 


lim Р{Х <х,Үє (у – е, у +e} 
dn ceo mee coeur 
£204 Р{Ү є (у – &, yel) 
Pal cd v) dv] du 
lim ————————————. 


+e 
e>0+ Hs f2(v) dv 


Ехүү (|у) 


Dividing numerator and denominator by 2= and passing to the limit as = — 0+, we 
have 
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fus fen y) du 
fry) 


=f [| du. 
-æL О) 


It follows that there exists a conditional PDF of X, given Y = y, that is expressed by 


Fxiy(x | y) = 


Р(х, у) 


5 0. 
hO) hO) > 


хуу У) = 


We have thus proved the following theorem. 


Theorem 6. Let f be the PDF of an RV (X, Y) of the continuous type, and let 
Р be the marginal PDF of Y. At every point (x, y) at which f is continuous and 
f2(y) > 0 and is continuous, the conditional PDF of X, given Y = y, exists and is 
expressed by 


Р(х, y) 


1 z : 
(15) fxiv (x | у) A) 


Note that 
f. fu, y) du = fa()Fxiy (x | y). 
so that 
куышы | Ге 1 yan | dy = D fao) Fxiy (х | y)dy, 


where F; is the marginal DF of X. 


It is clear that similar definitions may be made for the conditional DF and condi- 
tional PDF of the RV Y, given X = x, and an analog of Theorem 6 holds. 

In the general case, let (X1, X2, ... , Xn) be an n-dimensional RV of the continu- 
ous type with PDF fy, x... x, (x1, x2, ... , x4). Also, let {i} < i2 < --- < ik, ji < 
ji < +++ < jy} bea subset of {1,2,... , n}. Then 


F (Xis Xi. Xi | Xj Xj Xj) 
Xil ч k 
_ cx TE IX ig X4 eX jy X; бй, see Ui Хр (Xj) Hii dui, 
oo oo k Э 
Pong S fxi, Xio Xj nn Xj Ups «+ Ug XA ese Xj) Il dui, 


(17) 
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provided that the denominator exceeds 0. Here fxi, Du XE is the joint 
marginal PDF of (X;,, Xi,,... , Xi, Xj, Хр... , Xj). The conditional densities 
are obtained in a similar manner. 

The case in which (X1, X2, ... , Xn) is of the discrete type is treated similarly. 


Example 7. For the joint PDF of Example 5, we have 


fo» 1 
frxo |) = ру үг х<у<1, 


so that the conditional PDF fy|x is uniform on (x, 1). Also, 


1 
Руб 1 y) = —, О<х<у, 


bz 


which is uniform on (0, y). Thus 





1 

ТЕЕ Hof et 
1 

P{X > 3|у= 5} = 


We conclude this section with а discussion of a technique called truncation. We 
consider two types of truncation, each with a different objective. In probabilistic 
modeling we use truncated distributions when sampling from an incomplete popu- 
lation. 


Definition 11. Let X be an RV on (Q, S, P), and T € % such that O < P{X є 
T) « 1. Then the conditional distribution P(X < x | X € Т}, defined for any real 
x,is called the truncated distribution of X. 


If X is a discrete RV with PMF p; = P(X = xi, i = 1,2,... , the truncated 
distribution of X is given by 


pi | 
P(X =х,ХєТ = if x; e T, 

(18) P(X =x;|XeT}= HEIL = } Ver Pj 
ix eT) 0 otherwise. 


If X is of the continuous type with PDF f, then 


P(X <x, X ET} [sor РОУ 


Ne) Sl SE RETE C Sr f(y) dy 
T 


The PDF of the truncated distribution is given by 
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f(x) 
(20) h(x) = 4 fr fy) dy’ 
0, x&T. 


хєТ, 


Неге T is not necessarily a bounded set of real numbers. If we write Y for the RV 
with distribution function P{X < x | X € T}, then Y has support Т. 


Example 8. Let X be an RV with standard normal PDF 


—х2/2 
ех. 





1 
fo 


Let T = (—00,0]. Then P(X є T} = І, since Х is symmetric and continuous. For 
the truncated PDF, we have 


2f(), —oo <x <0, 


h = 
e 0, х > 0. 


Some other examples are the truncated Poisson distribution 


—A k 
€ x 
PIX =k} = XD kl 2s 


where T = {X > 1}, and the truncated uniform distribution 
1 ; 
fix)- 9' О <х < 0, and = 0 otherwise, 


where T = (X < 0}, 0 > 0. 


The second type of truncation is very useful in probability limit theory, especially 
when the DF F in question does not have a finite mean. Let a « b be finite real 
numbers. Define the RV X* by 


Х* = X ifa<X<b 
~ Jo ifX <a or X>b. 


This method produces an RV for which P{a < X* < b} = 1 so that X* has moments 
of all orders. The special case when b = c > 0 and a = —c is quite useful in 
probability limit theory when we wish to approximate X through bounded RVs. We 
say that X° is X truncated at c if Х = X for |X| < c, and = О for |X| > c. Then 
E|X*|* < c*. Moreover, 


P(X # X°} = Р{|Х| > с}, 
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so that c can be selected sufficiently large to make P(|X| > c) arbitrarily small. For 
example, if E|X|? < oo, then 


Е|Х|? 


PIXI > с) < —-, 
C 





and given є > 0, we can choose c such that E|X|? /c < є. 
The distribution of X^ is no longer the truncated distribution P(X < x | |X| x с}. 
In fact, 


0, ys-c, 

Еу) = F(y) — Е(—с), —c « y < 0, 
1 — F(c) + FQ), О<у<с, 
1, у> с, 


where F is the DF of X and F* is that of X°. 
A third type of truncation, sometimes called Winsorization, sets 


X*—X ifa<X<b, =a ifX<a, and =b ifX>b. 


This method also produces an RV for which P(a < X* < b) = 1, moments of all 
orders for X* exist, but its DF is given by 


F*(y) 20 fory<a, =F(y) fora<y<b, =1 fory>b. 


PROBLEMS 4.2 


1. Let F(x, y) = lif x + 2у > 1, and = Oif x + 2y < 1. Does F define a DF in 
the plane? 


2. Let T be a closed triangle in the plane with vertices (0,0), (0, nf 3), and 
(м2, /2). Let F(x, y) denote the elementary area of the intersection of T 
with ((x1,x2): xy < x,x2 < y]. Show that F defines a DF in the plane, and 
find its marginal DFs. 

3. Let (X, Y) have the joint PDF f defined by f(x, y) — 5 inside the square with 
corners at the points (1, 0), (0, D), (—1, 0), and (0, — 1) in the (x, y)-plane, and 
= 0 otherwise. Find the marginal PDFs of X and Y and the two conditional 
PDFs. 


4. Let f(x, y, z) 2e" У ^, x > 0, у > 0, z > 0, and = 0 otherwise, be the joint 
PDF of (X, У, Z). Compute P(X < Y < Z} and P(X = Y < Z}. 


5. Let (X, Y) have the joint PDF f(x, y) = tixy + (?/2)] 0 <x < 1,0 < 
y < 2, and = 0 otherwise. Find P{Y < 1 | X < 1). 
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6. For DFs F, F1, F?,... , Fa show that 


n 
1- Y 1 RED] < Рб, x»... x4) < min Fix) 
]1xi xn 


i=l 
for all real numbers x1, x2,... , Xn if and only if F;'s are marginal DFs of F. 


7. For the bivariate negative binomial distribution 


(x+y+k—1)! y k 
Р{Х = х,Ү = у} = Ег sd — Pi — рэ), 
where x,y = 0,1,2,...,k > 1 is an integer, 0 < р < 1,0 < рә < 1, 


and p; + p2 < 1, find the marginal PMFs of X and Y and the conditional 
distributions. 


In Problems 8 to 10, the bivariate distributions considered are not unique gener- 
alizations of the corresponding univariate distributions. 


8. For the bivariate Cauchy RV (X, Y) with PDF 
Р(х, y= +e + yy, —oo«x«oo -—oo«y«oo с> 0, 
find the marginal PDFs of X and Y. Find the conditional PDF of Y given X = x. 


9. For the bivariate beta RV (X, Y) with PDF 


(pi + p2 + рз) 
F(pi)l (pi (p3) 
x20, у>0, х+у<1, 


х!-1уР?—1(] — y — yP, 


Р(х, у) = 


where p1, p2, рз are positive real numbers, find the marginal PDFs of X and Y 
and the conditional PDFs. Find also the conditional PDF of Y/(1 — X), given 
X = x. 


10. For the bivariate gamma RV (X, Y) with PDF 
pot 
Г(о)Г(у) 


find the marginal PDFs of X and У and the conditional PDFs. Also, find the 
conditional PDF of Y — X given X = x, and the conditional distribution of X/Y 
given Y = y. 


11. For the bivariate hypergeometric RV (X, Y) with PMF 


м\т! үм М N—Npi-N 
пал) CR) Cae, 


x,y —0,1,2,...,n, 


/ (х,у) = x? ley — x Tle P, О<х<у; о, В,у> 0, 
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where x < Npi, у < Np,n—-x—y € М(1— pi — рә), №, п integers with 
n < N,andQ < pi < 1,0 < p2 < 1 so that py + p2 < 1, find the marginal 
PMFs of X and Y and the conditional PMFs. 


12. Let X be an RV with PDF f(x) = 1170 < x x 1, and = 0 otherwise. Let 
T = {x: <x< i. Find the PDF of the truncated distribution of X, its 
means, and its variance. 


13. Let X be an RV with PMF 


À* 
P(X = х} = e? —, x=0,1,2,...,A>0. 
x: 


Suppose that the value x = 0 cannot be observed. Find the PMF of the truncated 
RV, its mean, and its variance. 


14. Is the function 


exp(—u), О0<х<у<<и<осо 


x, 3 3 = 
fe y, zu) 0, elsewhere 


a joint density function? If so, find P(X < 7) where (X, Y, Z, U) is a random 
variable with density f. 
15. Show that the function defined by 


24 


(СЕК E pgs TIU NESCIUS Moe 


f(x,y,z, u) = 


and zero elsewhere is a joint density function. 
(a) Find P(X > Y > Z> U). 
(b) Find P(X +Y -Z-U > 1). 


16. Let (X, Y) have joint density function f and joint distribution function F. Sup- 
pose that 


Рот, yDfGa. y2 < РО, y2f (хә, y) 
holds for xy < a < x2 and y, < b < y2. Show that 
F(a, b) < Fi(a) F(b). 


17. Suppose that (X, Y, Z) are jointly distributed with density 


x z), x >0, -0, z>0 
Hepa 2(х)2(у)2 (2) y 
0 elsewhere. 


Find Р(Х > Y > Z). Hence find the probability that (x, y. z) € (X > Y > Z) 
or (X < Y < Z}. (Here g is a density function on R.) 
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We recall that the joint distribution of a multiple RV uniquely determines the 
marginal distributions of the component random variables, but in general, knowledge 
of marginal distributions is not enough to determine the joint distribution. Indeed, 
it is quite possible to have an infinite collection of joint densities f, with given 
marginal densities. 


Example 1 (Gumbel [36]). Let fi, f2, f3 be three PDFs with corresponding DFs 
Fi, Ез, Ёз, and let a be a constant, |o | < 1. Define 


fa (х1, x2, хз) = fi 1) (0) 3(хз) 
-{1 o[2F163) — 1)[2Fo(x2) — 102 F3G3) — 1]}. 


We show that Fy is a PDF for each a in [—1, 1] and that the collection of densities 
(fo; — 1 x o < 1) has the same marginal densities f1, fo, f3. First note that 


2 (х1) — 1][2Ё›(х›) — 1][2Ёз(хз) — 1I < 1, 
so that 
Ld o[2F1(x1) — 112F2(x2) — 1][2Ёз(хз) — 1] = 0. 


Also, 


Jl fa (х1, х2, хз) dx dx2 хз 
=I+a (fore i пла) (foro — Пр) 43) 


. (fore — 11/63) dia) 


= +F (хт) |2 — ПСЕ хо) [2 — ПСЕ 0з) | — 11} 
= 1. 


It follows that fx is a density function. That fj, f2, f; are the marginal densities of 
fa follows similarly. 


In this section we deal with a very special class of distributions in which the 
marginal distributions uniquely determine the joint distribution of a multiple RV. 
First we consider the bivariate case. 

Let F(x, y) and Fi(x), Р(у), respectively, be the joint DF of (X, Y) and the 
marginal DFs of X and Y. 


Definition 1. We say that X and Y are independent if and only if 
(1) F(x, у) = Fi(x)Fo(y) forall (x, у) € R2. 
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Lemma 1. If X and Y are independent and a. < c, b < d are real numbers, then 
(2) Pla«Xzxcb«Yxd)-Pí(a«X xc)P[b «Y <d}. 
Theorem 1 


(а) A necessary and sufficient condition for RVs X, Y of the discrete type to be 
independent is that 


(3) P{X = xi, Y = yj} = PIX = х:}Р(Ү = yj) 


for all pairs (xi, yj). 
(b) Two RVs X and Y, of the continuous type are independent if and only if 


(4) fx у) = ficofa0) for all (x, y) є Ro, 


where f, f|, f2, respectively, are the joint and marginal densities of X and Y, 
and f is everywhere continuous. 


Proof. (a)Let X,Y be independent. Then from Lemma 1, letting a — c and 
b — d, we get 


P(X «c, Y =d} = P{X =c}P{¥ = d). 


Conversely, 

F(x, y) 39 | PIX =x, Y = yj). 

B 
where 
B= {G, j): Xi < х,у; < у}. 
Then 
F(x, y) = УХ PIX 2x) PLY = yj} 
B 
= У |> Р\Ү = „| P(X = xi) = F(x)F(). 
хіх Lyj y 


The proof of part (b) is left as an exercise. 


Corollary. Let X and Y be independent RVs; then Fyjx (y | x) = Fy(y) for all 
y, and Fxjv(x | y) = Fx(x) for all x. 


Theorem 1. The RVs X and Y are independent if and only if 
(5) P{X € Aj, Y € A2} = PIX є Aj} P{Y € A2} 


for all Borel sets A, on the x-axis and A» on the y-axis. 
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Theorem 2. Let X and Y be independent RVs and f and g be Borel-measurable 
functions. Then f (X) and g(Y) are also independent. 


Proof. We have 


P(f(X) x x, g(Y) < y) = PIX € f^! (-0o,x], Y € g ! (00, yl} 
= P(X e f^ (оо, x]) P{Y € ^! (-00, yl] 
= P(f(X) < x) Pig(Y) < y). 
Note that a degenerate RV is independent of any RV. 
Example 2. Let X and Y be jointly distributed with PDF 
l+xy 


IQ, y) == 4 Р 
0, otherwise. 


Ix] < 1, lyl < 1, 


Then X and Y are not independent since є (х). = j, |х < 1, and f2(y) = 1 
|y! < 1, are the marginal densities of X and Y, respectively. However, the RVs X 
and Y? are independent. Indeed, 


- 


v!/2 ul? 
PU" su Y! sys f / f(x, у) хау 
02 Jua 


1 po; ul? 

=3/ [ (1+ху)ах | dy 
4 Jun | Jua 

2a V 


= P{X? <u} P{Y? < v). 


Note that $ (X 2) and y (Y 2) are independent where ф and y are Borel-measurable 
functions. But X is not a Borel-measurable function of X?. 


Example 3. We return to Buffon's needle problem, discussed in Examples 1.2.9 
and 1.3.7. Suppose that the RV R, which represents the distance from the center of 
the needle to the nearest line, is uniformly distributed on (0, /]. Suppose further that 
Ө, the angle that the needle forms with this line, is distributed uniformly on [0, л). 
If R and © are assumed to be independent, the joint PDF is given by 


1 1 
Le — fO<r<l, Ozm, 
Гол 

0 otherwise. 


frol, Ө) = fr) fo) = 


The needle will intersect the nearest line if and only if 
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Là O>R 
= Sin 3 
2 z 


Therefore, the required probability is given by 


2R x р(1/2) sing 
P {sino > FI = f Í fr.a(r,6) dr 40 
0 


Tg] 1 
= =f = $100 dé = —. 
Ix 0 2 л 
Definition 2. A collection of jointly distributed RVs X1, X2,... , Xn is said to 
be mutually or completely independent if and only if 


n 
(9  FGnxs...x)-—[[RFG0 forall Qi хо... 8) € Rn, 
i=l 


where F is the joint DF of (Xj, X2,... , Xn), and FA; (i = 1,2,...,n) is the 
marginal DF of X;. X1, ... , Xn are said to be pairwise independent if and only if 
every pair of them are independent. 


It is clear that an analog of Theorem 1 holds, but we leave it to the reader to 
construct it. 


Example 4. In Example 1 we cannot write 


fai. x2, x3) = Л (х1) fo(x2) f3 (x3) 


except when o = 0. It follows that X1, X2, and Хз are not independent except when 
a = 0. 


The following result is easy to prove. 


Theorem 3. If X1, X2,... , X, are independent, every subcollection X;,, Xi, 
..., Xi Of X1, X2,... , Xn is also independent. 


Remark 1. 1015 quite possible for RVs X1, X2, ... , Xn to be pairwise indepen- 
dent without being mutually independent. Let (X, Y, Z) have the joint PMF defined 
by 


2 — оу.) € (0,00), (0,1,1), 
(1, 0, 1), (1, 1, 0)), 


n. if (x, у, z) € {(0,0, 1), (0, 1, 0), 
(1,0,0), (1, 1, D} 


Р{Х =x, Y =y, Z=z}= 
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Clearly, X, Y, Z are not independent. (Why?) We have 


Р(Х = х,Ү = у) = 4, (x,y) є (0,0), (0, 1), (1,0), (1, D), 
Р{Ү = у, 2 =) = 1, (у, € (0,0), ©, D, (1,0), (1, D}, 
Р{Х = х,2 = 2) = 1, (х, z) € (0,0), (0, 1), (1,0), (1, D}, 
Р(Х =х}= 1, х=0,х= 1, 
Р{Ү = у=}, y=0,y=1, 


апа 
P(Z-:-4 с=0,т=1. 
It follows that X апа Y, Ү апа Z, and X апа Z are pairwise independent. 


Definition 3. A sequence {X,,} of RVs is said to be independent if for every n = 
2,3, 4,... the RVs Xj, X2,... , Xn are independent. 


Similarly, one can speak of an independent family of RVs. 


Definition 4. We say that RVs X and Y are identically distributed if X and Y 
have the same DF, that is, 


Fx (x) = Fy(x) for all x € R, 
where Fy and Fy are the DFs of X and Y, respectively. 

Definition 5. We say that (X4) is a sequence of independent, identically dis- 
tributed (iid) RVs with common law £(X) if {Xn} is an independent sequence of 
RVs and the distribution of X,(n — 1,2,...) is thesame as that of X. 

According to Definition 4, X and Y are identically distributed if and only if they 
have the same distribution. It does not follow that X — Y with probability 1 (see 
Problem 7). If P{X = Y} = 1, we say that X and Y are equivalent RVs. Al Defini- 
tion 4 says is that X and Y are identically distributed if and only if 

P{X € A} = P{Y є A} forall A є B. 
Nothing is said about the equality of events (X € A} and (Y € A}. 


Definition 6. Two multiple RVs (X1, X2,... , Xm) and (ү, Yo, ... , Yn) are said 
to be independent if 


(7) F(xi,x2,... ^m yb 2 ‚ Yn) == Fi(xi, x2, ... , Хт) F2(yi, ya, ... Yn) 
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for all (x1, X2, ... Xm. У, Yo» Y). € Rmin, where F, Fj, Рә are the joint 
distribution functions of (X41, X2,... , Xm, Yi, Yo, ... , Ya), (Xi, Xo, ... , Xm), 
and (Yi, Y2, .. . , Yn), respectively. 


Of course, the independence of X = (Х|, X2, ... , Xm) and Y = (Yi, Y2, ... , Yn) 
does not imply the independence of components X1, X2, ... , Xm of X or compo- 
nents У, Y2, ... , Y, of Y. 


Theorem 4. Let X = (Х|, X2, ... , Xm) and Y = (Yi, Y2, ... , Yn) be indepen- 
dent RVs. Then the component X ; of X(j = 1,2,... , т) and the component Y; of 
Y(k = 1, 2,..., п) are independent RVs. If A and g are Borel-measurable functions, 
h(X|, Хэ, ... , Xm) and g(Y), Yo, ... , Yn) are independent. 


Remark 2. tis possible that an RV X may be independent of Y and also of 
Z, but X may not be independent of the random vector (Y, Z). See the example in 


Remark 1. 


Let X1, X2, ... , Xn be independent and identically distributed RVs with common 
DF F. Then the joint DF G of (X1, X5, ... , Xn) is given by 


n 
GG x2, x) = [ [ Ре). 
j=l 


We note that for any of the n! permutations (xi,, xi, ... , Xin) Of (x1, X2, ... Xn) 


n 
GG x» xo = T Foo = Ох, nn s Xip) 
j=1 


so that С is a symmetric function of x1, x2, ... , Xa. Thus (X1, Х2,..., Xn) = 
(Xi, Xi, ... , Xip), where X £ Y means that X and Y are identically distributed 
RVs. 

Definition 7. The RVs Х|, X2, ... , Xn are said to be exchangeable if 


Су) € (X, Xi... Xin) 


for all n! permutations (i1, i2, ... , і.) of (1, 2,... , м). The RVs in the sequence 
{Xn} are said to be exchangeable if Xj, X2, ... , Xn are exchangeable for each n. 


Clearly if X1, X2, ... , X, are exchangeable, then X; are identically distributed 
but not necessarily independent. 
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Example 5. Suppose that X, Y, Z have joint PDF 


F(x +y +2), O<x<1,0<y<10<z<I, 
0, otherwise. 


гә} 


Then Х, Y, 2 are exchangeable but not independent. 


Example 6. Let X1, X2,... , Xn be iid RVs. Let Sn = M2 Xj,n-1,2,... 
and Y, = X, — S,/n,k = 1,2,...,n—. Then Yi, Yo, ... , Yn—1 are exchangeable. 


Theorem 5. Let X, Y be exchangeable RVs. Then X — Y has a symmetric dis- 
tribution. 


The proof is simple. 


Definition 8. Let X be an RV, and let X' be an RV that is independent of X and 
x’ £ X. We call the RV 


X'x-x 
the symmetrized X. 
In view of Theorem 5, X? is symmetric about zero so that 
P(X!z0]z1 and Р{Х*<0}> 1. 


If E|X| < oo, then Е|Х*| < 2EIX| < oo, and EX? = 0. 
The technique of symmetrization is an important tool in the study of probability 
limit theorems. We will need the following result later. The proof is left to the reader. 


Theorem 6. Еог > 0, 


(а) P{|X*| > в} x 2Р{|Х| > #/2). 
(b) If a > O such that P{X > a} < 1 — p and P(X < —a} x 1 — р, then 


P(IX*| > e) = Р{|Х] > a +=} 


fore > 0. 


PROBLEMS 4.3 


1. Let A be a set of k numbers and © be the set of all ordered samples of size n 
from A with replacement. Also, let S be the set of all subsets of © and P be a 
probability defined on S. Let X1, X2, ... , Xn be RVs defined on (©, S, P) by 
setting 


Xi(a1,a2, ... , an) = di (= 1,2,...,п). 
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Show that X1, X2, ... , Xn are independent if and only if each sample point is 
equally likely. 

2. Let X1, Хә be iid RVs with common PMF 
P(X = +1} = 1. 
Write Хз = X1X2. Show that Х|, X2, Хз are pairwise independent but not 
independent. 
3. Let (X4, X2, X3) be an RV with joint PMF 


f (x1, х2, х3) = if (x1, x2, x3) € A, 


T 
4 
=0 otherwise, 
where 
A = {(1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, D}. 
Are X1, X», Хз independent? Are Х|, X5, X3 pairwise independent? Are X, + 
X2 and X3 independent? 


4. Let X and Y be independent RVs such that XY is degenerate at с з 0. That is, 
P(XY =c) = 1. Show that X and Y are also degenerate. 


5. Let (Q, S, P) be a probability space and A, B € S. Define X and Y so that 
Х (о) = I4(o0), Ү (о) = Ig(o) forallo є ©. 


Show that X and Y are independent if and only if A and B are independent. 
6. Let X1, X2,... , Xn Беа set of exchangeable RVs. Then 


ч, k 
p(y) х l<k<n. 


Xi X2 o Xs п’ 


7. Let X апа Y be identically distributed. Construct an example to show that X апа 
Y need not be equal; that is, P{X = У} need not equal 1. 


8. Prove Lemma 1. 


9. Let X1, X2, ... , Xn be RVs with joint PDF f, and let f; be the marginal PDF 
of X;(j = 1,2,... , n). Show that Ху, X2,... , X, are independent if and only 
if 


fGixn.x)-[[f;i&) foral (x1, x2,-.. x) € Rn. 
j=l 
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10. Suppose that two buses, A and B, operate on a route. A person arrives at a certain 
bus stop on this route at time О. Let X and Y be the arrival times of buses A and 
B, respectively, at this bus stop. Suppose that X and Y are independent and have 
density functions given, respectively, by 


1 
Лх) =-. 0 <х < а, and zero elsewhere, 
а 


апа 
1 . 
ЛО) = Б" 0 < у <Б, and zero otherwise. 


What is the probability that bus A will arrive before bus В? 


11. Consider two batteries, one of brand A and the other of brand B. Brand A bat- 


teries have a length of life with density function 


f= ЗАх? exp(—Ax?), x0, апа zero elsewhere 
whereas brand B batteries have a length of life with density function given by 
g(x) = 3uy? ехр(—шу?), у > 0, and zero elsewhere. 


Brand A and brand В batteries operate independently and are put to a test. What 
is the probability that brand B battery will outlast brand A? In particular, what 
is the probability if A = и? 


12. (a) Let (X, Y) have joint density f. Show that X and Y are independent if and 


only if for some constant k > 0 and nonnegative functions f; and fo, 


f, у) = КАО) AM 


for all x, y є R. 


(b) Let A = (fx) > 0, В = (fv(y) > O}, and fx, fy are marginal densities 
of X and Y, respectively. Show that if X and Y are independent, then {f > 
0} = А х В. 


13. If $ is the CF of X, show that the CF of X“ is real and even. 


14. Let X, Y be jointly distributed with PDF f(x, у) = (1 — x?y)/A for |х| < 1, 


|у| « 1, and = О otherwise. Show that X £ y and that X — Y has a symmetric 
distribution. 


44 FUNCTIONS OF SEVERAL RANDOM VARIABLES 


Let X1, X2, ... , Xn be RVs defined on a probability space (©, S, P). In practice 
we deal with functions of Xi, X2,... , Xn such as Ху + Xo, X4 — X2, Х\Х», 
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min(X;,... , Xn), and so on. Are these also RVs? If so, how do we compute their 
distribution given the joint distribution of X1, X2,... , Xn? 
What functions of (Xj, X2, ... , Xn) are RVs? 


Theorem 1. Let g: Rn — Rm be a Borel-measurable function; that is, if B є 
Bm, then g ^! (B) є By. ЕХ = (X1, X2, ... , Xn) is an n-dimensional RV (n > 1), 
then g(X) is an m-dimensional RV. 


Proof. For B € Bm, 
(g(Xi, X2,..., Xn) € B} = (OG, X2,-.., Xn) eg! (B)), 


and since g^! (B) є Bp, it follows that ((X1, X2,... , Xn) € g^! (B)) є S, which 
concludes the proof. 


In particular, if g: Ra — Rm isa continuous function, then g(X1, X2,..., Xn) 
is an RV. 

How do we compute the distribution of g(X1, X2, ... , Xn)? There are several 
ways to go about it. We first consider the method of distribution functions. Suppose 
that Y = g(Xj,... , Xn) is real-valued, and let y є R. Then 


P(Y < у} = P(g(X1,.-.,Xn) < у) 


P(X; =x1,...,Xn = Xn) in the discrete case 
{(х1›..-,®п):8(х1,... And у} 
fxi,- Xn) хі ах in the continuous case 
[Gr ¥n):8(41,--- And <) 

where in the continuous case f is the joint PDF of (X1,... , Xn). 

In the continuous case we can obtain the PDF of Y = g(X1,... , Xn) by differen- 
tiating the DF P{Y < y} with respect to y provided that Y is also of the continuous 
type. In the discrete case it is easier to compute P(g(X1,... , Xn) = у}. 


We take a few examples, 


Example 1. Consider the bivariate negative binomial distribution with PMF 


RUBIO HEU bU PUT P» , 


where x, у = 0, 1,2,...; Е > 1 is an integer; py, p2 € (0, 1); and ру + p2 < 1. 
Let us find ће PMF of = X + У. We introduce ап ВУ V = Y (see Remark 1 
below) so that u = х + y, v = y represents a one-to-one mapping of A = ((x, у): 
x,y = 0,1, 2,...} onto the set B = {(u, v): v = 0,1,2,...,4; u 20,1,2,...) 
with inverse map x = u — v, y = v. It follows that the joint PMF of (U, V) is given 
by 
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(и 4 k — 1)! EET 
P{U =u, V =v} = ú- mag -D "py — pı — pr) for (и, v) € В, 


0 otherwise. 


The marginal PMF of U is given by 
(u+k — DIO — p = pr v ("Y ues y 
EI dies & — Diu. 2 J^ Р 


_ (uk — DIO ~ pi - po 


&— Diu (pi + p2)“ 


k —1 
(5 Jin * pta - pr— ро (и= 0, 1,2,...). 


Example 2. Let (X1, X2) have uniform distribution on the triangle (0 < x; < 
x2 < 1); that is, (X1, X2) has joint density function 


2, O< x1 <x2<1 


0, elsewhere. 


f Ох, x2) == | 


Let Y = X, + X2. Then for y < 0, PY < y) = 0, and for y > 2, PY < y) = 1. 
For 0 < y < 2, we have 


P(Y < у) = P(X, + X2 < у) = I f (x1, x2) хах. 


Osx] <x7<1 
x1+x2<y 


There are two cases to consider according to whether 0 < y < lor! < y < 2 (Fig. 
la and b). In the former case, 


y/2 yx y/2 у? 
Рт s»-[ (/ 244) du =2 | (y — 2x1) dx; = — 
xj-0 X2-X| 0 2 


and in the latter case, 


1 x 
РҮ <ууў=1-Р@>уу=1-/ us dds) dx: 


х2=у/2 ү=у—х2 
1 2 
—2 
ey ee [бейди йе О. 
y/2 2 
Hence the density function of Y is given by 
у, O<y<l, 


ДЖО) = 12-x. Ixyz2 
0, elsewhere. 


130 MULTIPLE RANDOM VARIABLES 


х2 


(а) 0 yl2 1 X 








(b) 0 y-1 у2 1 X 
Fig. 1. (а) {х +0 y, 0O« xi S9  LO«y lh tut» у, 0 << < 
1<ух 2). 


The method of distribution functions can also be used in the case when g takes 
values in Rm, 1 < m < n, but the integration becomes more involved. 


Example 3. Let X, be the time that a customer takes from getting in line at a 
service desk in a bank to completion of service, and let X? be the time she waits in 
line before she reaches the service desk. Then X, > X2 and X, — X; is the service 
time of the customer. Suppose that the joint density of (X1, X2) is given by 
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X2 


0 x 


Fig. 2. A = {х +0 < у, X1 — X2 < yx, 0 < X2 < X; < оо). 


ел, 0 < x2 < x4 < оо, 


fen | 


0, elsewhere. 


Let = Ху + X2 and Y? = X, — X2. Then the joint distribution of (Y;, Y2) is given 
by 


Р(Ү € у, Y < y2 =f [ romam dx2, 


where A = {(x1, x2): x1 + x2 € ур, X1 — x2 € y2, 0 € x2 € x1 < оо}. Clearly, 
ху + X2 > x1 — X2, 50 that the set A is as shown in Fig. 2. It follows that 


(у1—у2)/2 x2+y2 
P EGRE [ ( [ en dxi) dx) 
x 


x20 1=х2 


»/2 yix 
+ J (/ е“! ax) dx? 
x2=(y1—y2)/2 М xix? 


(у= у2)/2 
= f e ?(1—e ?)dx; 
0 


»/2 
+ f (e? — et) dxy 
(у1—у2)/?. 
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= (1 -e> — e 01-2) 
4 (e 019? _ e 1/2) егу (e? — gon ->2)/2у 


= 1 —e7 .—2,£47/72..26,-01t2/2 
Hence ће joint density of Y1, Ү is given by 


Le-Oity)/2, O< y2 < у «oo, 


fv.» Ол , y2) = 
. elsewhere. 


The marginal densities of Y;, Y2 are easily obtained as 


Љо) = е? for y; > 0, and 0 elsewhere; 
and 
Р, (y2) = e 2/2) — e») for y? > 0, and 0 elsewhere. 
We next consider the method of transformations. Let (X1,..., Xn) be jointly 
distributed with continuous PDF f (x1, x2, ... , Xn), and let y = g(x1, x2, ... , Xn) = 


(ут, Y2» --- › Yn), where 
yi = gi (X1, X2,..- Xn), DziÉb52.,. n 
be a mapping of Rn to Rn. Then 


P((Yi, Yo, ...,. Yn) € B) = P{(X1, Xo,..., Xn) eg |(B)) 
= РО, X2,.-- Xn) | | dxi, 
| к П 


where g`! (B) = {x = (x1, X2, ... , Xn) € Rn: B(x) € B). Let us choose B to be the 
n-dimensional interval 


В = By = ((у, у),..., у): — оо < yj S yi = 1,2,...,п). 
Then the joint DF of Y is given by 


P(Y € By} = Gy(y) = P{gi(X) < yi, 2200 < ys... 8400 < ул} 
= uuu] fom x dxi, 


and (if Gy is absolutely continuous) the PDF of Y is given by 
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a" Gy(y) 


wy) = —————— 
дуу дуд · · · дуп 


at every continuity point у of ш. Under certain conditions it is possible to write ш in 
terms of f by making a change of variable in the multiple integral. 


Theorem 2. Let (X1, X2,... , Xn) be an n-dimensional RV of the continuous 
type with PDF f (x1, x2, ... , X»). 


(a) Let 


yr 81041, X2... Xn), 


y2 = 8201, X2, eres, Xn), 


Yn = 8n (X1, X2, ... , Xn) 


be a one-to-one mapping of R, into itself; that is, there exists the inverse 
transformation 


ху = hi (Y1, Y2, --- Уп), X2 = й2(у1, y2,---, ys ee. 
Xn = hn (Y1, y2,--. ‚ Yn) 


defined over the range of the transformation. 
(b) Assume that both the mapping and its inverse are continuous. 
(c) Assume that the partial derivatives 


Ox; 


; l<i<n, Il<j<n, 
ду; 


exist and are continuous. 
(d) Assume that the Jacobian J of the inverse transformation 


x, дх\ дх\ 

дуу ду; ду» 

0x2 9x2 0x2 

у 8010.050) _ Зу дуз see oye 
9(у1, ..., Yn) : | h 

Әх, 0x, Xn 

дуу дуз ду» 


is different from zero for (y1, y2,... , Yn) in the range of ће transformation. 
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Then (Yi, Y2,... , Үл) has a joint absolutely continuous DF with PDF given 
by 


(1) wO, у2,..., Ya) = IFO, -- Ул),... Ani, - ++ y). 
Proof For (у, yz, ... + Yn) € Rn, let 
В = {(у\,у»....,у) € Rn: – оо <у ys, i-L2,....n]. 
Then 
g (B)-ixeR,:gx) eB) = {(ху,хә,...,х„): B®) € yj, 7 =1,2,...,n) 
and 
Gy(y) = P{Y € B} = P(X eg '(B)} 


= | P J femen ads 
g (B) 


» Уп д 5 Ы: 
=f "ИШ оти у E 0) 
—oo —бо д(у1, у2,... ‚ ул) 


Result (1) now follows on differentiation of DF Gy. 


Remark 1. Їп actual applications we will not know the mapping from x1, x2, 

‚ ‚Хп tO Y1, уз,... , Yn completely, but one or more of the functions g; will be 
known. If only k, 1 < k < n, of the g;’s are known, we introduce arbitrarily n — 
k functions such that the conditions of the theorem are satisfied. To find the joint 
marginal density of these k variables, we simply integrate the w function over all the 
n — k variables that were introduced arbitrarily. 


Remark 2. Ап analog of Theorem 2.5.4 holds, which we state without proof. 

Let X = (X1, X2,... , Xn) be an RV of the continuous type with joint PDF f, 
and let y; = gi(x1, X2,... , Ха), i = 1,2,... ,n, be a mapping of Ry into itself. 
Suppose that for each y the transformation g has a finite number k = k(y) of inverses. 
Suppose further that 7, can be partitioned into k disjoint sets A1, A2, ... , Ag, such 
that the transformation р from A;(i = 1,2,... n) into 7€, is one-to-one with in- 
verse transformation 


ху = һу,(у1, у2,...- s Уп), TERS Xn = hn; (ут, Уә.... + Yn), i=1,2,...,k. 


Suppose that the first partial derivatives are continuous and that each Jacobian 
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Ohi; Ohi; дһ; 

Эу ду дут 

0h2; dh; дһ; 

= | ðy à» Әу 
х аһы Ohni "S д 

дуу ду дуп 


is different from zero in the range of the transformation. Then the joint PDF of Y is 
given by 


k 
WOB уз... Y) = FLS hui O1, Y2 Ун), шу, Ya +- Уп)). 
i=] 


Example 4. Let X1, X2, Хз be iid RVs with common exponential density func- 
tion 


ех if x > 0, 
x)= 
fœ i otherwise. 
Also, let 
X14 X2 X| 
Yi = Ху + X24 Хз, Yo = ——————-, and Үз = ————. 
x MI Ж dE ADU oe Aa dd ЕЕ 


Then 


x1 = y0y275, х2 = y1y2— X1 = yiy2(1 — уз), and 
хз = yi — уру? = yı (l — y2). 


The Jacobian of transformation is given by 


y2y3 y1y3 y1y2 
Ј = | уг(1— уз) у(1—уз) -yy | = —ytn. 
1—у› —yı 0 


Note that 0 < ур < оо, 0 < у) < 1, andO < уз < 1. Thus the joint PDF of 
Yi, Yo, Үз is given by 


ш(ут, уз, уз) = y? yze 
= Qy»(1y?e^"), О<ур<оо, O<y, уз <1. 


It follows that Y;, Y2, and Y3 are independent. 
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-t 
Fig. 3. {0 < y + y2 < 2,0 < у – у < 2}. 


Example 5. Let X4, X? be independent RVs with common density given by 


го |, ЖО = x <1, 


0 otherwise. 


Let Y; = X, + X2, Yo = X, — X2. Then the Jacobian of the transformation is given 
by 


nie 








NIK NI 


l 
2 


and the joint density of Y;, Y? (Fig. 3) is given by 


1 + = 
fan = 5702522) 70222) 


мо 1222 c, o< 1-2 <1, 


1 R 
5 if (y1, y2) € {0 < yi + y2 < 2,0 < yi- y2 < 2}. 


The marginal PDFs of Y; and Y» are given by 


э 247 = Уп. 0<y <l, 


Оол) = {f 


жы yan = 2 = ур. суре, 


0, otherwise; 
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2 
T idy = уз +1, -1 < у < 0, 


fno = Lu 14у =1-у, O<y <1, 


0, otherwise. 
Example 6. Let X1, X2, Хз be iid RVs with common PDF 


aye 
gt — 00 «x < oo. 





1 
Хб) = с 
Let Yı = (X1—X2)/V2, Y; = (X1-- X2 2X3)/4/6, and Үз = (X1+X2+X3)/V3. 
Then 


У y2 y3 
х= +H, 
2 46 3 
M, y2 y3 
n=- p, 
2 6 453 
and 
d 
“З 45 


1 1 1 

№ 46 48 

—1 1 1 
Ык x ш 

б 92 4 

43. 43 


The joint PDF of X, X2, X3 is given by 


] x2 4x2 + х2 
g(Q1, x2, Хз) = (A P — , X1, X2, X3 € R. 


It is easily checked that 
х}+х}+х] = у + у? + y 
so that the joint PDF of Y1, Y2, Үз is given by 


2 2 2 
wy, y2, y3) = zb en У +09 + Уз A 
(V27)? 2 


It follows that Y1, Y2, Үз are also iid RVs with common PDF f. 
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In Example 6 the transformation used is orthogonal and is known as Helmert’s 
transformation. In fact, we will show in Section 7.6 that under orthogonal transfor- 
mations iid RVs with PDF f defined above are transformed into iid RVs with the 


same PDF. 
In Example 6 it is easily verified that 


3 2 
ху d X2 d x3 
яе (y Stats)” 
ј=1 


We have therefore proved that (Х| + X2? + X3) is independent of $i (Xj - + 
X24- X3)/ 3}}*. This is a very important result in mathematical statistics, and we will 
return to it in Section 7.5. 


Example 7. Let (X, Y) bea bivariate normal RV with joint PDF 
1 


Ј(х, у) = а pie 
1 (x—p1)? 2о(х—ил)(у— ио) (у = мо)? 
SOND a oe a aaea a ge ae 
2(1 — р?) о 9102 оў 
-0 <х < 00, —O < y<; ш ER, иЄ; 


and о> 0, о> 0, lp} < 1. 
Let 


Х 
U =VX? +Y? and m=z 


For u; > 0, 


yx? +y? =u; and * =m 
y 


have two solutions: 





u1u2 u] 
Blin нк О. ЖШ enc and х= —x1, y=- 
/1+ и JI us 
for any u2 € R. The Jacobians are given by 
u2 uj 
243/2 
[1+и$ (1 + и5)3/ ui 
ЛД = Љ = 1 ии? EEUU 


/1 +u? (1 + 3? 
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It follows from the result in Remark 2 that the joint PDF of (U;, U2) is given by 


u um | 
1u Jitu ee 
w(ui,u2) = iis 


ifu > 0, и ER, 


id Jitu Js 


In the special case where ш = [42 = 0, p = 0, and o1 = 02 = с, we have 


otherwise. 





ela? +y)/207] 


/(х, у) = 
so that X and Y are independent. Moreover, 


Јо, у) = Сх, у), 


and it follows that when Х and Y аге independent, 








1 2u 
5 Le tine ui > 0, –оо <и) < оо, 
w(u1, u2) = { 2ло 1+ и5 
0, otherwise. 
Since 
шоп, u2) = — Le ntt 
л(1 + и?) о? 


it follows that U; and U2 are independent with marginal PDFs given by 


uy —u?/20? 
—€ L , uy > 0, 
wi(u1) = 10? 
0, ui <0, 
and 
w2(u2) : оо со 
242) = —>> E <U < , 
л(1 +u?) 
respectively. 


An important application of the result in Remark 2 will appear in Theorem 4.7.2. 
Theorem 3. Let (X, Y) be an КУ of the continuous type with PDF f. Let 


Z=X+Y, U=xX-Y, and У = XY; 
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and let W = X/Y. Then the PDFs of Z, V, U, and W are, respectively, given by 


(2) по = | fG.z — x)dx, 
(3) fo = f fu+y,y)dy, 
—oo 
d v. 1 
(4 ло = ft (x2) Rae, 
and 
(5) futu) = | f (xw, x)|x| dx. 


The proof is left as an exercise. 


Corollary. If X and Y are independent with PDFs fı and f2, respectively, then 


oo 
(6) fate) = | Si@) fale — x) dx, 
(7) fulu) = f. fiu + у) Р(у) dy, 
® ло [^ hen (2) сах. 
and 
(9) Ду (ш) = [ Л (хш) fo(x) |x| dx. 


Remark 3. Let Е and С be two absolutely continuous DFs; then 


со со 
но) = f FG - )б'у)ду = f G(x — y) F’(y) dy 


—O0 


is also an absolutely continuous DF with PDF 


Н'(х) = | F'(x — y)G'(y) dy = J G'(x — y)F'(y) dy. 
—00 
If 


F(x) = 5 pye(x —x,). and G(x) = 5 aye — yp 
k ј 
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are two DFs, then 


H(x) = 2: У`радјғ(х — хк — yj) 
k j 


is also a DF of an RV of the discrete type. The DF H is called the convolution of 
F and G, and we write H = F * G. Clearly, the operation is commutative and 
associative; that is, if F1, F2, F3 are DFs, Е ж Ёз = Fo x Fy and (Б ж Р) ж Ёз = 
Еж (Р F3). In this terminology, if X and Y are independent RVs with DFs F and 
G, respectively, X + Ү has the convolution DF H = F * С. Extension to an arbitrary 
number of independent RVs is obvious. 


Finally, we consider a technique based on MGF or CF which can be used in 


certain situations to determine the distribution of a function g(X1, X2,... , Xn) of 
X1, X2,... , Xn. 
Let (X1, X2, ... , Xn) be an n-variate RV, and g be a Borel-measurable function 


from Ry to 4. 
Definition 1. If (X1, X2, ... , Xn) is discrete type and 


D> leer, x2, P = x1, X2 = х2,..., Xa = In} < 00, 


X1,- Xn 
then the series 


Eg(X1, X2, ove Xn) 


= D 81, x2, ... An) P[X1 = х, X2 = х2,..., Xn = Xn} 
Xis- Xn 
is called the expected value of g(X1, X2, ... , Xn). If (X1, X2,... , Xn) is a contin- 


uous RV with joint PDF f, and if 


oo оо оо п 
f f ef |#(х1, x2. --- „хаа, x2, T dx; < oo, 
OM OD 00 i=1 


then 


Eg(X1, X2,... , Xn) 
n 

- f. «faeere |] d 
i=] 


is called the expected value of g(X1, X2,... , Xn). 
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Let Y = g(X1, X2, ... , Xn), and let h(y) be its PDF. If E|Y| < oo, then 


со 
ey = f yh(y) dy. 
—oo 


An analog of Theorem 3.2. 1. holds. That is, 


оо оо оо оо п 
[ yhty) dy = | J -f BX x2... x) РО, x2... х) [| dxi, 
—00 —00 J—oo —oo izil 


in the sense that if either integral exists, so does the other, and the two are equal. The 
result also holds in the discrete case. 

Some special functions of interest are У“ j=i Xj П; = AS Í where k1, k2, ... , ks 
are nonnegative integers, e3i- 145), where fj,12,... , f, are real numbers, and 


e 3-2 U*i, where i = /—1. 


Definition 2. Let X1, X2, ... , X, be jointly distributed. If E (eLi=! Xi) exists 
for |t;| € hj, j =1,2,...,n, for some hj > 0, j = 1,2,... ‚п, we write 


(10) M(t, b, ... 5 tp) = Eet Xi taot ntm Xn 


and call it the MGF of the joint distribution of (Х |, X2, ... , Xn) or, simply, the MGF 
of (X1, X2,... , X4). 


Definition 3. Let гу, 12, ... , 1, be real numbers and i = 4 —1. Then the CF of 
(X1, X2, ... , X4) is defined by 


(0 b(t... tm) = Е E ( ux) 
j=l 
=E |o (5x) +iE E (5) 
j=l j=l 


As in the univariate case ф (t1, t2,... , tn) always exists. 


We will deal mostly with МСЕ even though the condition that it exist for |t;| < 
hj, j = 1,2,...,n restricts its application considerably. The multivariate MGF 
(CF) has properties similar to the univariate MGF discussed earlier. We state some of 
these without proof. For notational convenience we restrict ourselves to the bivariate 
case. 


Theorem 4. The МСЕ M(t;,t2) uniquely determines the joint distribution of 
(X, Y), and conversely, if the MGF exists, it is unique. 


Corollary. The МСЕ M (ti, t2) completely determines the marginal distributions 
of X and Y. Indeed, 
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(12) M(t, 0) = Ee^* = Mx(n), 
and 
(13) M(0, to) = Ee?! = My(t2). 


Theorem 5. If M(t), t2) exists, the moments of all orders of (X, Y) exist and may 
be obtained from 


а"+" M (n, t2) 


(14) = E(X" y"), 
дү ду ty=t2=0 
Thus 
dM(0, 0 
3M (0, 0) = EX, aM, 0) = EY, 
at 8t 
2 2 
M(0,0 М (0,0 
чы; = ЕХ?, сы = EY?, 
at 9t; 
8?M (0, 0) 
= E(XY), 
at dt OY) 
and so on. 


A formal definition of moments in the multivariate case will be given in Sec- 
tion 4.5. 


Theorem 6. X and Y are independent RVs if and only if 
(15) M(t, t2) = M (tı, 0) M (0, t2) for all ty, £ ER. 
Proof. Let X and Y be independent. Then 
M (ti, t2) = Ee X**Y — (ge Xy(gehY) = M(t, 0)M(0, р). 
Conversely, if 
M (t1, 12) = M(t, 0)M (О, t5), 


then in the continuous case, 


[| e? F(x, y)dx dy = p e fios] [/ e? (у) ду], 


f [ eh» F(x, y) dx dy = f ji e? fi (x) (у) dx dy. 


that is, 
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By the uniqueness of the MGF (Theorem 4) we must have 
fa, у) = АО) (у) Юга (x, y) € R2. 


It follows that X and Y are independent. A similar proof is given in the case where 
(X, Y) is of the discrete type. 


The MGF technique uses the uniqueness property of Theorem 4. To find the dis- 
tribution (DF, PDF, or PMF) of Y = g(X1, X2,... , Xn) we compute ће MGF of Y 
using the definition. If this MGF is one of the known kind, Y must have this kind of 
distribution. Although the technique applies to the case when Y is an m-dimensional 
RV, 1 < k < n, we will use it mostly for the m = 1 case. 


Example 8. Let us first consider a simple case when X is normal PDF 








1 2 
(х) = ех 2, —со<х < —0оо. 
f М2л 
Let Y = X?. Then 
My(s) = Ee” 1 ji (1/2)(1-28)x? 3 
8) = Бе = —— е х 
d 4 2x J—oo 
= : forx < 1 
~ М5 т 
It follows (see Section 5.3 and Example 2.5.7) that Y has a chi-square PDF 
Wy) = , у> У. 
Мул 


Example 9. Suppose that X; and X2 are independent with common PDF f of 
Example 8. Let Y; = X,— Хз. There are three equivalent ways to use MGF technique 
here. Let Y? — X». Then rather than compute 


M(s1,52) = Ee Yt, 
it is simpler to recognize that Y, is univariate, so 
My, (5) = Ees(X1-X2) 
= (Ee5X )(Ee^5*1) 


2 2 2 
ew. 5 


It follows that Y; has PDF 


£d 
e 4 


1 
f@= Vig : 





—00 < X < OQ. 


Note that My, (s) = M(s, 0). 
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Let Үз = Ху + X2. Let us find the joint distribution of Y; and Y3. Indeed, 


Eg ti +913 — E (e 1*2 .e ~82)X2) 


= (Еее )(Ее®! -52X1) 


= е@®+52)#/2‚ „(1—)?/2 ust. e% 


and it follows that Y; апа Үз are independent RVs with common PDF f defined 
above. 


The following result has many applications, as we will see. Example 9 is a special 
case. 


Theorem 7. Let X1, X2,..., Xn be independent RVs with respective MGFs 


Mi(s) i = 1,2,...,n. Then the MGF of Y = У? а; X; for real numbers 
ау, 02, ... , an iS given by 


My(s) = П М: (ais). 
i=l 


Proof. f М; exists for |s| < hj, h; > 0, then My exists for |s| < min(Ay,... , An) 
and 


п п 
My(s) = Ее 71-19 = | | ке = ТТ Mi (ais). 


i=l i=} 
Corollary. If X;’s are iid, the MGF of Y = У X; is given by Му (5) = [M(s)]". 


Remark 4. The converse of Theorem 7 does not hold. We leave the reader to 
construct an example illustrating this fact. 


Example 10. Let X1, X2,... , Xm be iid RVs with common PMF 


n 


Р{Х =k}= i: 


ora —p)y"*, 6=0,1,2,...,п; O«p«l. 


Then the MGF of X; is given by 
M(t) =(1—p+pe')". 


It follows that the MGF of Sm = X, + X2+---+ Xm is 


m 
Msg, (t) = [ [0 — p + pe = (1 - p + per", 
1 
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and we see that Sm has the PMF 
Р{5 = 5} = ("ра —ру""*%,  в=0,1,2,...‚тп. 
$ 


From these examples it is clear that to use this technique effectively one must be 
able to recognize the MGF of the function under consideration. In Chapter 5 we study 
a number of commonly occurring probability distributions and derive their MGFs 
(whenever they exist). We will have occasion to use Theorem 7 quite frequently. 

For integer-valued RVs one can sometimes use PGFs to compute the distribution 
of certain functions of a multiple RV. 

We emphasize the fact that a CF always exists and analogs of Theorems 4 to 7 
can be stated in terms of CF's. 


PROBLEMS 4.4 


1. Let F bea DF and є be a positive real number. Show that 


1 XTE 
Vi(x) = JI F(x)dx 
Є Jx 
апа 
1 х+= 
W(x) = zÍ F(x)dx 
2e Jx—e 


are also distribution functions. 


2. Let X, Y be iid RVs with common PDF 
e* if x > 0, 
Here f ifx <0. 
(a) Find the PDF of RVs X 4- Y, X — Y, XY, X/Y, min(X, Y], max(X, Y), 
min(X, Y )/ max(X, У}, and X/(X + У). 
(b) Let = X + Y and V = X — Y. Find the conditional PDF of V, given 
U = u, for some fixed u > 0. 


(c) Show that U and Z = X/(X + Y) are independent. 


3. Let X and Y be independent RVs defined on the space (©, S, P). Let X be 
uniformly distributed on (—a, a), a > 0, and Y be an RV of the continuous type 
with density f, where f is continuous and positive on R. Let F be the DF of Y. 
If ug € (—a, a) is a fixed number, show that 
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fo 


= aaaea ifuo—a < y < uọ +a, 
frixevG | uo) = 4 F(uo- a) — F(uo — a) 
0 


otherwise. 


where fyjx+y (y | uo) is the conditional density function of Y, given X+Y = uo. 
4. Let X and Y be iid RVs with common PDF 


1 {О<х<1, 
0 otherwise. 


w=] 


Find the PDFs of RVs XY, X/Y, min(X, Y}, max(X, Y}, min{X, Y}/ max(X, Y}. 
5. Let X1, X2, X3 be iid RVs with common density function 


if0<x <1, 
otherwise. 


1 
ramla 


Show that the PDF of U = Ху + X2 + Хз is given by 


и? 


35 О<и<1, 
3 
ne Зи =u — =, 1<u <2, 
(и — 3)? 
E P. 2<u <3, 
0, elsewhere. 


An extension to the n-variate case holds. 


6. Let X and Y be independent RVs with common geometric PMF 
Р{Х =} =л(1- л),  k=0,1,2,...; О<л<1. 


Also, let М = max{X, Y}. Find the joint distribution of М and X, the marginal 
distribution of M, and the conditional distribution of X, given M. 


7. Let X be a nonnegative RV of the continuous type. The integral part, Y, of X is 
distributed with PMF P(Y = k} = Аел, k =0,1,2,... , 4 > 0; and the 
fractional part, Z, of X has PDF f,(z) = 1 if 0 < z < 1, and = 0 otherwise. 
Find the PDF of X, assuming that Y and Z are independent. 


8. Let X and Y be independent RVs. If at least one of X and Y is of the continuous 
type, show that X 4- Y is also continuous. What if X and Y are not independent? 
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9. Let X and Y be independent integral RVs. Show that 
P(t) = Px(t)Py(), 


where Р, Px, and Py, respectively, are the PGFs of X + Y, X, and Y. 


10. Let X and Y be independent nonnegative RVs of the continuous type with PDFs 
f and g, respectively. Let f(x) = e™* if x > 0, and = 0 if x < 0, and let g 
be arbitrary. Show that the МОЕ M(t) of Y, which is assumed to exist, has the 


property that the DF of X/Y is 1 — M(-1). 
11. Let X, Y, Z have the joint PDF 


6(01+х+у+2)7* #0 < х,0 < у,0 <z, 
0 otherwise. 


fea | 


Find the PDF of U = X + Y +Z. 
12. Let X and Y be iid RVs with common PDF 


Gavr) 71е 0/2008 х)? х > 0, 
Ро) = 
0, x < 0. 


Find ће PDF of Z = XY. 


13. Let X and Y be iid RVs with common PDF f defined in Example 8. Find the 
joint PDF of U and V in the following cases: 
(а) U = /Х?+Ү?, V = tan (X/Y), –л/2 < V < 2/2. 
(b) U = (X + Y)/2, V = (X — Y }/2. 

14. Construct an example to show that even when the MGF of X 4- Y can be writ- 


ten as a product of the MGF of X and the MGF of Y, X and Y need not be 
independent. 


15. Let X1, X2, ... , X, be iid with common PDF 
1 И 
Хбх) = Гарна a<x<b, = 0 otherwise. 
—a 


Using the distribution function technique, show that: 


(a) The joint PDF of Xm) = max(Xi, X2,... , Xn), and Xa) = min(Xi, X2, 
... , Xa) is given by 


n(n — D — yy? 


ау , а<у<х <}, 


и(х, у) = 


and = 0 otherwise. 
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(b) The PDF of Xin) is given by 


оү 
e(z) = E a<z<b, -—Ootherwise 
and that of Хр) by 
n(b — zy! | 
A(z) = "bay a<z<b, = 0 otherwise. 


16. Let Х|, Хә be iid with common Poisson PMF 


АХ 
P(X; = х) =e"—, х= 0,1,2,..., і= 1,2, 
x! Е 


where А > 0 is a constant. Let X = max(X 1, X2) and Xa) = min(X, X2). 
Find the PMF of X. 


17. Let X have the binomial PMF 


n 


Р(Х = №) = (rta - ру, к=0,1,...,п; O<p<1. 


Let Y be independent of X and Y 2 X. Find the PMF of = X -- Y and 
W=X-Y. 


4.5 COVARIANCE, CORRELATION, AND MOMENTS 
Let X and Y be jointly distributed on (2, S, P). In Section 4.4 we defined Eg(X, Y) 
for Borel functions g on R2. Functions of the form g(x, у) = x/y*, where j and k 


are nonnegative integers, are of interest in probability and statistics. 


Definition 1. If E|X JY*| < oo for nonnegative integers j and К, we call 
E(XJ Y*) a moment of order (j + К) of (X, Y) and write 


(1) тук = E(X Y*). 
Clearly, 


(2) mio = EX, mo; = EY, 
тр = EX?, mn = E(XY), and mg = EY?. 

Definition 2. If Е [a — EX) (Y —E Y)*| < oo for nonnegative integers j and 

k, we call E {(X — EX)/(Y — EY)*} a central moment of order (j + k) and write 
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(3) uje = E |(Х — EXV Y ~ EYF}. 
Clearly, 


(4) M10 = ио = 0, роо = var(X), ио? = var(Y), and 
uir = ЕХ — mioY(Y — moi)l. 

We see easily that 

(5) uii = E(XY) - EXEY. 


Note that if X and Y increase (or decrease) together, then (X — E X)(Y — EY) should 
be positive, whereas if X decreases while Y increases (and conversely), the product 
should be negative. Hence the average value of (X — EX)(Y — EY), namely 41, 
provides a measure of association or joint variation between X and У. 


Definition 3. If E((X — EX)(Y — EY)] exists, we call it the covariance between 
X and Y and write 


(6) cov(X, Y) = E[(X ~ EX)(Y — EY)] = E(XY) — EXEY. 


Recall (Theorem 3.2.8) that E(Y — а)? is minimized when we choose a = EY 
so that EY may be interpreted as the best constant predictor of Y. If, instead, we 
choose to predict У by a linear function of X, say aX + b, and measure the error 
in this prediction by E(Y — aX — b)?, we should choose a and b to minimize this 
mean square error. Clearly, E(Y — aX — b)? is minimized, for any a, by choosing 
b = E(Y — aX) = EY —aEX. With this choice of b, we find a such that 


E(Y - aX — b = EY — EY) — а(х — EX) 


= оў — 2аши +а?оў 


is minimum. An easy computation shows that the minimum occurs if we choose 
(7) а= —, 
provided that оў > 0. Moreover, 


min E(Y — aX — b)? — min lo? — 2а +аоў| 
а, а 


2 
ш 
(8) =9 = 70 
Ох 
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Let us write 


oxoy 


Then (8) shows that predicting Y by a linear function of X reduces the prediction 
error from оў to of (1 — p^). We may therefore think of p as a measure of the linear 
dependence between RVs X and Y. 


Definition 4. If EX?, EY? exist, we define the correlation coefficient between 
X and Y as 
cov(X, Y) E(XY) — EXEY 
p = mee шш -EA , 
SD(X)SD(Y) VEX? — (EX VEY? — (ЕҮ)? 


where SD(X) denotes the standard deviation of RV X. 


(10) 


We note that for any two real numbers a and b, 


2 4 p2 

а +b 
b| < Й 
labi < 2 





so that E|XY| < œ if EX? < oo and EY? < оо. 


Definition 5. We say that RVs X and Y are uncorrelated if о = О, or equivalently, 
cov(X, Y) = 0. 


If X and Y are independent, then from (5) cov(X, Y) = 0 and, X and Y are 
uncorrelated. If, however, p — 0, then X and Y may not necessarily be independent. 


Example 1. Let U and V be two RVs with common mean and common variance. 
Let X = U + V and Y = U — V. Then 


cov(X, Y) = E(U? — V2) — E(U + V)E(U — V) =0 


so that X and Y are uncorrelated but not necessarily independent (see Example 
4.4.9). 


Let us now study some properties of the correlation coefficient. From the defini- 
tion we see that p [and also cov(X, Y)] is symmetric in X and Y. 


Theorem 1 


(a) The correlation coefficient p between two RVs X and Y satisfies 


aD 01 <1. 


(b) The equality |o| = 1 holds if and only if there exist constants a з 0 and b 
such that P{aX +b = 1) = 1. 
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Proof. From (8) since E(Y — aX — b)? > 0, we must have 1 — o? > 0, or 
equivalently, (11) holds. 

Equality in (11) holds if and only if 02 = 1, or equivalently, E(Y — aX — b)? = 0 
holds. This implies and is implied by P(Y = aX + b) = 1. Here a #0. 


Remark 1. From (7) and (9) we note that the signs of a and р are the same, so if 
p = 1, then P(Y = aX + b) where a > 0, and if o = —1, thena < 0. 


Theorem 2. Let EX? < оо, EY? < oo, and let U = aX +b, V = cY +d. Then 


рх,ү = +рџ,у, 


where ох у and py, у, respectively, are the correlation coefficients between X and Y 
and U and V. 


The proof is simple and is left as an exercise. 


Example 2. Let X, Y be identically distributed with common PMF 
1 





Р(Х = к} = x. k=1,2,...,N(N > 1). 
Then 
N -- DON 
ЕХ = gy = a. Ex? = кү? = NtVON +1) 
2 6 
so that 
d sued 
var = var = 12 
Also, 
E(XY) = ЦЕХ? + EY? — E(X – Ү)?] 
_(N+DQN+1) E - Y) 
Е 6 2 І 
Thus 
(N+ 1QN+41) E(X-Y) (N+1)? 
cov(X, Y) = 6 2 1 
(N+1)(N—1) 1 A 
= > -E(X -- Y’, 
12 2 ( ) 
and 
(№2 — 1)/12 — E(X — Y)?/2 
BE = аа 


(N2—1)/12 
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6E(X — Y}? 
m]—————. 
N?-1 
If P(X = Y] = 1, then p = 1, and conversely. If P(Y = N + 1 — X} = 1, then 
Е(Х – Ү)2 = Е(2Х – № – 1)? 
(N+DQN+1) (OD? 


кыйт = ыш AEN 


№ +1)?, 
6 j TO +1) 


and it follows that оху = —1. Conversely, if ох у = —1, from Remark 1 it follows 
that Y = —aX + b with probability 1 for some a > 0 and some real number b. To 
find a and b, we note that EY = —aEX + b, so that b = [(N + 1)/2](1 + a). Also, 
EY? = E(b —aX)*, which yields 

(1 — a) EX? + 2abEX — b? = 0. 


Substituting for b in terms of a and the values of EX? and E X, we see that а? = 1, 
so thata = 1. Hence b = N +1, and it follows that Y = N--1— X with probability 1. 


Example 3. Let (X, Y) be jointly distributed with density function 


x+y, 0<x<1, О<у<1, 
fe» | : 2 


0, otherwise. 


Then 


1 1 
E(X! Y") af f x! y" (x + y) dx dy 
0 JO 


1 
-f f d x! yt dx dy 
0 J0 
1 


= +2) +1) + рэ; 


where / and m are positive integers. Thus 


ЕХ = EY = D 
EX? — EY? = $, 
var(X) = var(Y) = $ — $$ = di. 
and 
cov(X, Y) — 1- 2h. pc 
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Theorem 3. Let X1, X2, ... , X, be RVs such that E|X;| < оо, i= 1,2,...,n. 
Let a1, a2, ... , ал be real numbers, and write 


S =a,X; +a2X2 cb as Xs. 


Then E'S exists, and we have 


n 
(12) ES = ў ajEX;. 
ja 


Proof. If (Xj, X2, ... , Xn) is of the discrete type, then 


ES = у, (aixi, + ах» +... + anxi, ) PIX1 = х, X2 = х„,...,Х„ = xi] 


i} ,i2,... sin 


=a) as У PIX) Sts Xy = x] 
п dad 


Fal 


Tos хь y P{X1 = xi... Xn = xi) 
in 


ЕНИ R 
= а $ xi, PUG = x4) ++ а, УЎРХ, = xi) 
it in 
= a, EX, +---+a4,EXn. 
The existence of ES follows easily by replacing each a; by |а; | and each xij by 


|x;j| and remembering that E|X;| < oo, j = 1,2,... , n. The case of continuous 
type (X1, X2, ... , Xn) is treated similarly. 


Corollary. Take a, = аз = --- =a, = 1/п. Then 
X ves @ 1: 
(а Ex, 
n n izi 
and if EX; = EX? =---= EX, = p, then 
(алом) 
Е MU ON =. 


Theorem 4. Let X1, X2,... , Xn be independent RVs such that E|X;| < oo, i = 
1,2,...,m. Then E([]}_ Xi) exists and 


(13) E (fi x) = П ЕХ. 
i=l i=} 


Let X and Y be independent апа 21 (-) and g2(-) be Borel-measurable functions. 
Then we know (Theorem 4.3.2) that g1(X) and g2(Y) are independent. If E[g1(X)], 
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E[g2(Y)], and Е[21(Х) g2(Y)] exist, it follows from Theorem 4 that 


(14) Elgi (X) g2(Y)] = Elg: C X0] Elga(¥)). 


Conversely, if for any Borel sets A; and Аз we take g1(X) = | if X є Aj, and = 0 
otherwise, and g2(Y) = 1 if Y € A2, and = 0 otherwise, then 


E[gi(X)g2(¥)] = P{X € A1, Y € A2} 


and E[g1(X)] = P{X є Aj}, Е[22(У)] = P{Y є А2}. Relation (14) implies that 
for any Borel sets A; and A» of real numbers 


P{X € Aj, Y € A2} = P{X e Aı}P {Y € A2}. 


It follows that X and Y are independent if (14) holds. We have thus proved the 
following theorem. 


Theorem 5. Two RVs X and Y are independent if and only if for every pair of 
Borel-measurable functions g1 and g2 the relation 


(15) E[g1(X)g2(Y)] = Elgi(X)] Е[в2(%ї)] 
holds, provided that the expectations on both sides of (15) exist. 


Theorem 6. Let X1, X2,... , X, be RVs with E|X;[? < oo fori = 1,2,... ,n. 
Let aj, a2, ... , a, be real numbers and write 5 = У , а; Xj. Then the variance of 
S exists and is given by 


n n n 
(16) var(S) = Y ^ a? var(X;) + 9 У) aiaj cov(Xi, Xj). 
i=l i=l j=1 
ifj 
If, in particular, X1, X2, ... , X, are such that cov(X;, Ху) = Ofori, j = 1,2,...,n, 
i Æ j, then 
n 
(17) var(S) = Y | a? var(X;). 
= 


Proof. We have 
n n 2 
var(S) = E (5 aj X; — аг) 
і=1 ї=1 


=E [уж — ЕХ”? + }ў ajaj(Xj – EXXX; - 27) 


i=l izj 
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n 
= У`а?Е(Х; — EX + ў ajajE[(X; — EXi)(Xj — EX;)). 
i=l 12) 
If the X;’s satisfy 
cov(X;, X;) =0 fori, ј = 1,2,...,п; if Jj, 
the second term on the right side of (16) vanishes, and we have (17). 


Corollary 1. Let X1, X2,... , Xn be exchangeable RVs with var(X;) = о?,і = 
1, 2,... , n. Then 


n n n 
var (ах =o? уа? + ра? J aiaj, 
i=l i=l itj 
where p is the correlation coefficient between X; and X;, i # j. In particular, 


n 2 

Xi с n-i 4 
уаг — | = — . 
(s)-in 


i=l 





Corollary 2. If X1, X2, ... , Xn are exchangeable and uncorrelated, then 


п п 
var axi = 0? Уа, 
i=l i=l 


and 
n 2 
Xi o 
var у — | = —. 
(> н) " 


Theorem 7. Let Х|, X2,... , X, be iid RVs with common variance o?. Also, let 
а1, Q2, ... , аң be real numbers such that У aj = 1, and let 5 = У? у a; Xj. Then 
the variance of 5 is least if we choose a; = 1/n,i = 1,2,...,n. 


Proof. We have 


п 


уаг(5) = о? ўар, 


i=l 


which is least if and only if we choose the a;'s so that ) 7.., a? is smallest, subject 
to the condition $ 7. а; = 1. We have 


n 3 n 1 1 2 
Уа (а-а) 


і= 
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п кү ox 1 1 
„(еа к= 


i=l 
n I\ 1 
= ) (а — 3 +-, 
n n n 
і=1 


which is minimized for the choice а; = 1/n, i = 1,2,...,n. 
Note that the result holds if we replace independence by the condition that X;'s 
are exchangeable and uncorrelated. 


Example 4. Suppose that r balls are drawn one at a time without replacement 
from a bag containing n white and m black balis. Let S, be the number of black balls 
drawn. 

Let us define RVs X, as follows: 


1 if the kth ball drawn is black 
х = 4 5 25K 
0 if the kth ball drawn is white 





Then 
Sr = Ху + Xo+---4+X,. 
Also, 
m n 
18 P{X; = 1} = ——, and P{X,=0}= . 
(18) {Хк = 1} nua {Хк = 0) I 


Thus E X, = m/(m + n), and 


xt m m mn 
m+n (mcn (m+n 


To compute cov(X j, Xx), j Æ К, note that the RV X; X, = 1 if the jth and kth balls 
drawn are black, and — 0 otherwise. Thus 


m m-i 
19 E(X;Xy) = P{X; = 1, X; = 1] = —— ————— 
( ) (Xj Xy) {Xj k ] жЕ т ЕП 1 
апа 
cov(X ;, Xx) = liek 


© (m+n?P(m+n—1) 
Thus 


r 


ES, = ў EX: = 
k=1 


mr 
m+n 
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and 
mn 


уе э + п) (т + n — 1) 


тп 

Vi 5, = f- — 
ar(S,) Larne 

mnr 


= es aa por msn 


Readers are asked to satisfy themselves that (18) and (19) hold. 


Example 5. Let Xj, X2,... , X, be independent, and aj,a2,... , a, be real 
numbers such that Уа; = 1. Assume that E|X?I « oo, i — 1,2,...,n, and 
let var(X;) = o2,i = 1,2,...,n. Write 5 = уза Х. Then var($) = 


i? 





isl a?o? = с, say. To find weights a; such that o is minimum, we write 
c = ato? +а202 + + (1 —ai — 42 — -«: — ay-1)207, 
and differentiate partially with respect (0 a1, a2, .. . , @n—1, respectively. We get 
до 
— = 2а10? —2(1—a1—92 — · · — аһ 1)а2 = 0, 
дау 
до 
= 245—107 4 —2(1—a1—a2 —---.— аһ 1)02 = 0. 
даһ-1 
It follows that 
ajo; = аһо?, j2,...,n-—]1, 


that is, the weights aj, j = 1, 2,... , n, should be chosen proportional to 1 /о?. Тһе 
minimum value of с is then 


n 2 n 
k 1 
2 2 
отп = ) 2-192 =Y 
о - о: 


+ 


where H is the harmonic mean of the аў. 


We conclude this section with some important moment inequalities. We begin 
with the simple inequality 


(20) la + bl < c (al + |b’), 
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where c, = 1 for0 < r < 1, and = 2^! forr > 1. Forr = Oandr = 1, (20) is 
trivially true. 

First note that it is sufficient to prove (20) when 0 < a < b. Let 0 < a < b, and 
write x — a/b. Then 


(a+b) (1+) 
a+b dex 








Writing f(x) = (1+ x) /(1 + x^), we see that 


r +x! 


0) 


Го) = 


where 0 < х < 1. It follows that f'(x) > Oifr > 1, =Oifr = l, and < Oifr < 1. 
Thus 


Пах /(ху = /(0) = 1 ifr <1, 


while 


тах (х) = f(0 = 27-1 ifr >1. 


0<х<1 
Note that la + b|” < 2' (la + |b|”) is trivially true since 
la + b| < maxQla|, 2|d}). 
An immediate application of (20) is the following result. 


Theorem 8. Let X and Y be RVs andr > O be a fixed number. If EIX|", E|Y |" 
are both finite, so also is Е|Х + YI". 


Proof. Leta — X and b — Y in (20). Taking the expectation on both sides, we 
see that 


E|X + Y|' < с„(Е|Х|' + Е|Ү|”), 
where c, = 110 <r < 1 апа = 27-1 ifr > 1. 
Next we establish Hólder's inequality, 


Р q 
(21) ie aE 
P q 


where p and q are positive real numbers such that p > 1 and 1/p + 1/q = 1. Note 
that for x > 0 the function w = log х is concave. It follows that for x1, x2 > 0, 


log[1x; + (1 — £)x2] 2 tlogx; + (1 — t) log x2. 
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Taking antilogarithms, we get 
1-t 


xjxj > txi t (1 — 0x. 


Now we choose ху = |х|?, хә = lyl?, t = 1/p, 1 — t = 1/9, where p > 1 and 
1/р + 1/9 = 1, to get (21). 


Theorem 9. Let p > 1, д > 1, so that 1/p + 1/9 = 1. Then 
Q2) E|XY| < (Е|Х|Р)!/Р(Е|Ү|%)!/4. 


Proof Ву Hólder's inequality, letting x = X[E|X|7]- ^, y = Y[E|Y|2]- 1/4, 
we get 


IXY] < p^ XIPLE|XIP)P- plv (9/9 + qg YPEY Exe. 
Taking the expectation on both sides leads to (22). 

Corollary. Taking p = q = 2, we obtain the Cauchy—Schwarz inequality, 

E|XY| < Exp Ey p, 

The final result of this section is an inequality due to Minkowski. 

Theorem 10. For p > 1 
(23) [EIX + Y|^]"? < [Е|Х|Р]!/Р + [Е|Ү|Р]!/Р. 

Proof. We have, for p > 1, 

IX - YI? € XL [X + Y 77! + [vp (x + УРТ. 


Taking expectations and using Hólder's inequality with Y replaced by | X +Y |? Tp 
1), we have 


E|X + YI? < LE|XIPYP[E|X + Y (0706372 4 [EY |P PLEX + Y (07 Data 
= (LEIX |^]? + [E|Y|^]?] - [E|X + УЦР 0604, 


Excluding the trivial case in which E|X + Y|? = 0, and noting that (p — 1)9 = р, 
we have, after dividing both sides of the last inequality by [E|X + Y |]!/4, 


[EIX + Y^]? < [EXPP Б pE]Y|?]?, p> 1. 


The case p — 1 being trivial, this establishes (23). 


COVARIANCE, CORRELATION, AND MOMENTS 161 


PROBLEMS 4.5 
1. Suppose that the RV (X,Y) is uniformly distributed over the region R = 
{(x, у): 0 <x «y < 1). Find the covariance between X and Y. 
2. Let (X, Y) have the joint PDF given by 


ЖУ. ; 
x* 4 — if0<x<1,0<y <2, 
fo.» | 3 


otherwise. 


Find all moments of order 2. 


3. Let (X, Y) be distributed with joint density 


fæ уу [itx 9) ір, 
| 0 otherwise. 


Find the MGF of (X, Y). Are X, Y independent? If not, find the covariance 
between X and Y. 


4. For a positive RV X with finite first moment, show that (a) EV X < VEX and 
(b) E(1/X) > 1/ЕХ. 


5. If X is a nondegenerate RV with finite expectation and such that X > a > 0, 
then 


Е{\/ X? — a?) < у (ЕХ)? — a?. 


(Kruskal [54]) 
6. Show that for x > 0, 


oo 2 2 oo í со 2 
(/ te /2 at) <f et Par | re" /? аг, 
x x x 


and hence that 


о _2/2 1 231/2 —x?/2 
f e P dt > MA. x2)? xu 
x 


7. Given a PDF f that is nondecreasing in the interval a < x < b, show that for 
anys > 0 


b 2 4H q^55*1 
E /од)ах > AUR hE sf fG) dx, 


with the inequality reversed if f is nonincreasing. 
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8. Derive the Lyapunov inequality (Theorem 3.4.3) 
[EX] < EIX}, — 14r «s «oo, 


from Hólder's inequality (22). 


9. Let X be an RV with E|X|" < oo forr > 0. Show that the function log E|X|" 
is a convex function of r. 


10. Show with the help of an example that Theorem 9 is not true for p « 1. 


11. Show that the converse of Theorem 8 also holds for independent RVs; that is, if 
E|X + Y| < oo for some r > 0 and X and Y are independent, then E|X|" < 
oo, E|Y|" < oo. (Hint: Without loss of generality, assume that the median of 
both X and Y is 0. Show that for any t > 0, Р(Х -Y| >t} > 5 P{|X| >t}. 
Now use the remarks preceding Lemma 3.2.2 to conclude that E| Х|" < oo.) 


12. Let (Q, S, P) be a probability space and А], A2,... , An be events in S such 
that P(Up_, Ak) > 0. Show that 


(Shar PAO? — Yu РАк 


2 Р(А;Ак) > 
> ( j k) Z P(UR_, Au) 


1<ј<К<п 


(Hint: Let X be ће indicator function of Ay, k = 1, 2,... , n. Use the Cauchy- 
Schwarz inequality.) (Chung and Erdós [13]) 


Let (О, S, P) be a probability space and А, B € S withO < РА < 1,0 < 
PB < 1. Define р(А, B) by p(A, B) = correlation coefficient between RVs ГА 
and Ip, where ГА, Ip, are the indicator functions of A and B, respectively. Ex- 
press p(A, B) interms of PA, P B, and P(AB), and conclude that p(A, B) = 0 
if and only if A and B are independent. What happens if A — B orif A — B*? 


(a) Show that 


13 


p(A, B) > 0 < P(A] B) > P(A) = P(B| A) > P(B) 
and 
p(A, B) <0@ Р{А | B] < РА < P{B| A} < РВ. 


(b) Show that 


P(AB) P(A‘ B^) — P(AB?) P(A*B) 


A, B) = 
Pla?) (PA PAC. PB РВ) 


14. Let X1, X2,... , Xn be iid RVs, and define 


Dial Xi and 52 = Dea Xi Ds xy? . 


Х = 
п n-—i 
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15. 


16. 


17. 


18. 


19. 


Suppose that the common distribution is symmetric. Assuming the existence of 
moments of appropriate order, show that cov(X, S 2) — 0. 


Let X, Y be iid RVs with common standard normal density 
1 en 

М2л 

Let U = X + Y and V = X? + Y?. Find the МСЕ of the random variable 


(U, V). Also, find the correlation coefficient between U and V. Are U and V 
independent? 





—00 « X < OQ. 


ҒО) = 


Let Х and У be two discrete RVs: 

P{X = xi} = pi, P{X = x2} = 1— pr, 
and 

РҮ = у} = p2, Р(У = y} = 1 — рә. 


Show that X and У are independent if and only if the correlation coefficient 
between X and Y is zero. 


Let X and Y be dependent RVs with common means 0, variances 1, and corre- 
lation coefficient p. Show that 


E(max(X?, ү2)] < 1 - /1— р2. 


Let X1, X2 be independent normal RVs with density functions 


1 1 =н), › 
exp| ——- , —00 < х < 00; і = 1,2. 
оу27 j| ;( о 





ЛО) = 





Also let 
Z = X1cos0 + Хәѕіп and W = X2cos0 — Х| sind. 


Find the correlation coefficient, p, between Z and W, and show that 


2 
222 
0 <р? < 21 2]. 
of oj 
Let (Ху, X2,... , Xn) be an RV such that the correlation coefficient between 
each pair X;, Xj, i Æ j, is p. Show that (п – 1) ! < p <1. 
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20. Let Xj, X2,...,Xmin be iid RVs with finite second moment. Let S, = 
уду! Xj,k = 1,2,...,m + n. Find the correlation coefficient between S, 
and Smin — Sm, where n > m. 


21. Let f be the PDF of a positive RV, and write 


{G+ ifx > 0, у> 0, 
g(x,y) = x+y 
0 


otherwise. 


Show that g is a density function in the plane. If the mth moment of f exists for 
some positive integer m, find E X". Compute the means and variances of X and 
Y and the correlation coefficient between X and Y in terms of moments of f. 
(Adapted from Feller [23, p. 100].) 


22. A die is thrown n + 2 times. After each throw a + sign is recorded for 4, 5, or 6, 
and a — sign for 1, 2, or 3, the signs forming an ordered sequence. Each sign, ex- 
cept the first and the last, is attached a characteristic RV that assumes the value 1 
if both the neighboring signs differ from the one between them, and 0 otherwise. 
Let X1, X2, ... , Xn be these characteristic RVs, where X; corresponds to the 
(i + Dstsign (i = 1,2, ... , n) in the sequence. Show that 


n n 5 Ll 
E (хх) = = and var (Èx) = =. 
1 1 


23. Let (X, Ү) be jointly distributed with PDF f defined by f(x, у) = 1 inside the 
square with corners at the points (0, 1), (1, 0), (—1, 0), (0, —1) in the (x, y)- 
plane, and f(x, y) = 0 otherwise. Are X, Y independent? Are they uncorre- 
lated? 


4.6 CONDITIONAL EXPECTATION 


In Section 4.2 we defined the conditional distribution of an RV X, given Y. We 
showed that if (X, Y) is of the discrete type, the conditional PMF of X, given Y = уу, 
where P(Y = yj) > 0, is a PMF when considered as a function of the x;'s (for 
fixed y;). Similarly, if (X, Y) is an RV of the continuous type with PDF f (x, y) and 
marginal densities fı and fo, respectively, then at every point (x, у) at which f is 
continuous and at which / (у) > 0 and is continuous, a conditional density function 
of X, given Y, exists and may be defined by 


_ fay) 
fxy(x i» = Lo) 
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We also showed that fx;y(x | y), for fixed y, when considered as a function of x 
is a PDF in its own right. Therefore, we can (and do) consider the moments of this 
conditional distribution. 


Definition 1. Let X and Y be RVs defined on a probability space ($2, S, P), and 
let h be a Borel-measurable function. Then the conditional expectation of h(X), 
given Y, written as E{h(X) | Y), is an RV that takes the value E{h(X) | y], defined 
by 


o hG)PIX =x | Y 2 у} if (X, Y) is of the discrete 
3 type and P{Y = y} > 0, 


к | 
f h(x) fxiv (x | y)dx if (X, Y) is of the continuous 
Е type and f2(y) > 0, 


(1) E{h(X) | y] = 


when the RV Y assumes the value y. 


Needless to say, a similar definition may be given for the conditional expectation 
E{h(Y) | X]. 

It is immediate that E {A(X} | Y} satisfies the usual properties of an expectation 
provided we remember that E(h(X) | У} is not a constant but an RV. The following 
results are easy to prove. We assume the existence of indicated expectations. 


(2) E{c|Y}=c for any constant c 
and 
(3)  Ellaig1(X) + a2g2(X)] | Y} = a1 E{g1(X) | Y} + an E{g2(X) | Y}, 


for any Borel functions g1, 22. 


(4) P(X > 0) = 1 = E{X | Y}>0 
and 
(5) P(X, 2 X2) = 1 => E(X1| Y} 2 E{X2 | Y]. 


The statements in (3), (4), and (5) should be understood to hold with probability 1. 
(6) E{X | Y} = E(X), E{Y | X} = E(Y) 


for independent RVs X and Y. 
If ф(Х, Y) is a function of X and Y, then 


(7) E(o(X, Y) | y] = Е{ф(Х, y) | y}, 


and 
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(8) E{y(X)(X, Y) | X} = ү(Х)Е{Ф(Х,Ү) | X] 


for any Borel function y. 

Again, (8) should be understood as holding with probability 1. Relation (7) is 
useful as a computational device. See Example 3 below. 

The moments of a conditional distribution are defined in the usual manner. Thus, 
forr > 0, E(X" | Y] defines ће rth moment of the conditional distribution. We 
can define the central moments of the conditional distribution and, in particular, the 
variance. There is no difficulty in generalizing these concepts for n-dimensional dis- 
tributions when n > 2. We leave the reader to furnish the details. 


Example I. An urn contains three red and two green balls. A random sample of 
two balls is drawn (a) with replacement, and (b) without replacement. Let X = 0 if 
the first ball drawn is green, — 1 if the first ball drawn is red, and let Y — O if the 
second ball drawn is green, — 1 if the second ball drawn is red. 

The joint PMF of (X, Y) is given in the following tables: 


(a) With replacement (b) Without replacement 





2 3 
5 5 


The conditional PMFs and the conditional expectations are as follows: 


2 2 
=, х : Ез 0, 
(a) P{X=x|0}=43 nr» [i y 
3^ > 5^ y 1, 
2 2 
: Е, 0, Е, 1, 
p= fi пуу 2 
5, 1; 5, y , 
3 3 0 
Ex|yy-dy ? E(y | X}= 43" i 
3 3 ; 
5, y , 5, 1; 
1, 0, l, у= 0, 
(b РХ = х |0} = 44 P(Y =у10}= {$ 
T = |}, 4: У 1, 
1 0, 1l у= 0, 
PIX =х|1)= {2° Р{Ү = у| 1) = {2 
1 1 
5, 1, 2, y 1, 
0, 3e eg. 
Ex IY=}? 7” E(Y | X) 2 411 
Jo y 1, 2: = 1. 
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Example 2. For the RV (X, Y) considered in Examples 4.2.5 and 4.2.7, 


11—х? 1+х 





1 
ey x)= f ухо | x) dy = 24— 1^ O<x<il, 
and 
y 
gx | у= f х/хуу Œ 1y)dx = 5, 0 <у <1. 
Also, 
» 1 2 
E(x? | у). = | yldpelo 0<y<1l 
0 Y 3 
and 


var(X | y} = E(X? | y] — [E{X | y}? 


2 2 2 
СЧ ARTE AR И 
mus гер? 0 <у < 1. 
Theorem 1. Let Eh(X) exist. Then 
(9) Eh(X) = E{E{h(X) | Y}. 


Proof. Let (X, Y) be of the discrete type. Then 


E{E{h(X) | У}} = Y; [Exon =x|Y= J Р{Ү =y} 


y 


=}; [ном =x,Y= J 
y x 

= У`һо) >> Р(Х =x, Y =y) 
x y 


= Eh(X). 
The proof in the continuous case is similar. 
Theorem 1 is quite useful in computation of Eh(X) in many applications. 
Example 3. Let X and Y be independent continuous RVs with respective PDF f 
and g and DFs F and G. Then P(X « Y] is of interest in many statistical applica- 


tions. In view of Theorem 1, 


P(X < Y} = Elixzyy = E(EUtxzv)lYH 
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where 74 is the indicator function of event A. Now 


Е{Цх<ү}|Ү = y} = Elltx<y | y} 
= E(x <y) = FO) 


and it follows that 


P(X < Y)— gira = f F(y)80) dy. 


—00 


If, in particular, X = Ү, then 


P(X <Y}= [ F(y) f(y) dy = 1. 


—©о 


More generally, 


P{X - Y x z} = E{E{lix-y <z} | YH} = ELF + 2)] 


-f F(y +z)g(y)dy 


gives the DF of Z = X — Y as computed in corollary to Theorem 4.4.3. 


Example 4. Consider the joint PDF 


f(x, y) = xe *0*», х> 0, у> 0, and zero otherwise 
of (X, Y). Then 
fx(x) =e™, x > 0, апа zero otherwise 
and 
1 
= ————, > 0, and zero otherwise. 
fr) Gay? y 


Clearly, EY does not exist but 
1 
EY x)= f ухе dy = —. 
0 Xx 


Theorem 2. If E X? < oo, then 


(10) var(X) = var(E{X | Y) + E(var(X | Y}). 
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Proof. The right-hand side of (10) equals, by definition, 
(GECE(X | Y — [ECE{X | YDP] + ECE( | Y] — (E{X | Y?) 
= {E(E{X | Y})? — (EX) + EX? — Е(Е(Х | Y»? 
= var(X). 
Corollary. If EX? < oo, then 
(11) var(X) > var(E(X | Y) 
with equality if and only if X is a function of Y. 


Equation (11) follows immediately from (10). The equality in (11) holds if and 
only if 


E(var{X | Y) = E(X — E(X | YD? =0, 
which holds if and only if with probability 1 
(12) X = E(X | Y]. 


Example 5. Let X1, X2, ... be iid RVs and let N be a positive integer-valued RV. 
Let $y — Xs X, and suppose that the X’s and N are independent. Then 


E(Su) = E(E(Su | NJ. 
Now 
E{Sy | N =n} = E{S,|N =n) =nEX, 
so that 
E(Sy) = E(NEX)) = (ENKEX)). 
Again, we have assumed above and below that all indicated expectations exist. Also, 
var(Sy) = var(E(Sy | N}) + E(var(Sw | N}). 
First, 
var(E{Sy | N}) = var(NEX1) = (ЕХ)? var(N). 
Second, 


var{Sy | N =n} =n var(X1), 


170 MULTIPLE RANDOM VARIABLES 


so 
E(var{Sn | ND = (EN) var(X)). 
It follows that 
var($y) = (ЕХ)? var(N) + (EN) уаг(Х |). 
PROBLEMS 4.6 


1. Let X be an RV with PDF given by 


1 (x — py 


1 
Р(х) = —] 3 o? 





|, —oo«x«oo -—oo«pu-«oo, ao > 0. 


Find E(X | a < X < b}, where a and b are constants. 


2. (a) Let (X, Y) be jointly distributed with density 


ya x) fe 0. x,y 0, 
0, otherwise. 


fe.» | 


Find E(Y | X]. 
(b) Do the same for the joint density 


4 
fay) = ni Sone es. yee 


0, otherwise. 


3. Let (X, У) be jointly distributed with bivariate normal density 


1 
fa, у) = ———— 
20010271 — p? 


1 х-ш\ү 4, x-pny-ya (=) 
ap Se pe eet 
s | 2(1 — p?) К 01 ) А 21 02 3 02 


Find E(X | y) and E{Y | x}. (Here, ui, u2 € R,01,02 > 0, and |p| < 1.) 
4. Find E(Y — E{Y | Хр2. 
5. Show that E(Y — ф(Х))? is minimized by choosing ф(Х) = E{Y | X). 
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6. Let X have PMF 


Me 


РАХ =)=, х=0,12,... 





and suppose that A is a realization of a КУ A with PDF 
РО) =>, A20. 


Find E(e^^ | X = 1}. 

7. Find E(XY) by conditioning on X or Y for the following cases: 
(a) f(x, y) = xe? 0*5, x > 0, y > 0, and zero otherwise. 
(b) f(x, у) = 2,0 < y xx < 1, and zero otherwise. 


8. Suppose that X has uniform PDF f(x) = 1,0 < x < 1 and zero otherwise. Let 
Y be chosen from interval (0, X] according to the PDF 


1 
ayy |x)=-, 0 <у<х, апі zero otherwise 
X 


Find E(Y* | X} and EY* for any fixed constant k > 0. 
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Let (X1, X2, ... , Xn) be an n-dimensional random variable, and (x1, x2,... , Xn) 
be an n-tuple assumed by (X1, X2,... , Xn). Arrange (x1, x2, ... , Xn) in increasing 


order of magnitude so that 


XQ) € XQ) € ^: SX), 


where x(1) = min(xj, x2, ... , Xn), хо) is ће second smallest value in x1, x2, .. . , Xn, 
and so on, X(n) = max(x1, x2, ... , Xn). If any two xj, xj are equal, their order does 
not matter. 


Definition 1. The function Хур of (X1, X2, ... , Xn) that takes on the value xq) 


in each possible sequence (x1, x2, ... , Xn) of values assumed by (X1, X2, ... , Xn) 
is known as the kth-order statistic or statistic of order К. (X(», Хо), ... , Xm} is 
called the set of order statistics for (X1, X2, ... , Xn). 


Example І. Let Xi, X2, Хз be three RVs of the discrete type. Also, let X1, X3 
take on values 0, i, and X? take on values 1, 2, 3. Then the RV (X1, X5, Хз) assumes 
these triplets of values: (0, 1, 0), (0, 2, 0), (0, 3, 0), (0, 1, D, (0,2, 1), (06,3, 1), 
(1, 1, 0), (1, 2, 0), (1, 3, 0), (1, 1, 1), (1,2, 1), (1,3, 1); Xq takes on values 0, 1; 
Х (2) takes on values 0, 1; and Хз) takes on values 1, 2, 3. 
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Theorem 1. Let (X1, X2, ... , Xn) be an n-dimensional RV. Let Хо, 1 < k < 
п, be the statistic of order k. Then Xx) is also an RV. 


Statistical considerations such as sufficiency, completeness, invariance, and ancil- 
larity (Chapter 8) lead to the consideration of order statistics in problems of statistical 
inference. Order statistics are particularly useful in nonparametric statistics (Chap- 
ter 13), where, for example, many test procedures are based on ranks of observations. 
Many of these methods require the distribution of the ordered observations, which 
we now study. 

In the following we assume that X1, X2, ... , Xn are iid RVs. In the discrete case 
there is no magic formula to compute the distribution of any Xj) or any of the joint 
distributions. А direct computation is the best course of action. 


Example 2. Suppose that X,'s аге iid with geometric PMF 
pr = P(X. =k) = p, k=1,2,...,0<p<1, q=1-p. 
Then for any integers x > 1 andr > 1, 


Р(Х = х} = Р{Х < x) - Р{Х <х — H. 


Now 
P(X(» € x} = P(atleastr of X'sare < x} 
=}, (iron < OFI > xr 
i=] r 

and 

оо 

P(X > х) = 9 pa! = (0 — py. 

К=х 

It follows that 


P{X(r) = x} = Y (jee len = q*Y ales gy , 


[x 1 
x =1,2,....In particular, let n = ғ = 2. Then 
P(Xo) = х} = pa (pq 42-24), — x21. 
Also, for integers x, y > 1 we have 


P(Xo) = x, Xo) - Xm = y] = P(X) =x, Xo) — x yl 
= P{X; =x,X2=x+y}+ P(X, =x+ y,X2 =x} 
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= pq. рах) 
= 2pq?* ? . pq? = Р{Х(уу = х}Р(Хоу = у} 
and 


P{Xqay = 1, Хо) — Ха) = 0} = Р(Х = Хо = 1) = р?. 


It follows that X1) апа X(?) — X(1) are independent RVs and, moreover, that X(2) — 
Хер has a geometric distribution. 


In the following we assume that Х|, X2,... , Xn are iid RVs of the continu- 
ous type with PDF f. Let {X(1), X(2),-.. , Ху} be the set of order statistics for 
Ху, X5, ... , Xn. Since the X; are all continuous type RVs, it follows with probabil- 
ity 1 that 

Xm < XQ) <--- < Хо. 


Theorem 2. The joint PDF of (Хт), Хо), ... , Xo) is given by 


n! IE f (xq). XQ) € X(g) € ++ < п), 
1 хер, ХҖ2у,... Хо) = Et 
(D а(х, хо) m) | otherwise. 


Proof. The transformation from (X1, X2, ... , Xn) to (Xa), Хо), ..., Xm) is 
not one-to-one. In fact, there are п! possible arrangements of x1, x2, ... , x, in in- 
creasing order of magnitude. Thus there are n! inverses to the transformation. For 
example, one of the п! permutations might be 


X4 € X1 € Xn—] € X3 <`- < Xn < X2. 
Then the corresponding inverse is 
ха =X), X1 = XQ), Ха] = X(3. X3 == ХД), ..., Xn = (н), X2 = X(n). 
The Jacobian of this transformation is the determinant of an n x n identity matrix 


with rows rearranged, since each xq) equals one and only one of х, x2, .. . , Xn. 
Therefore, J = +1, and 


п 
8 (ху, Хп), х4), XA)» - -+ > XG) Җл—1)) = [F Co. ха) < XQ) < +++ < X(n)- 


i=l 


The same expression holds for each of the n! arrangements. 
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It follows (see Remark 4.4.2) that 


n 
g(x. Хоу... xm) = У усо) 


‚айт i=l 
inverses 


РО) ао) fé) — ifxa) «xo < xo 
0 otherwise. 


Example 3. Let X4, X2, Хз, X4 be iid RVs with PDF f. The joint PDF of 
Xa Хо), XQ)» Xu is 


4 fod f (уз) РОЗ) Оа), yi < y2 < уз < yo 


n , , У4) = 
gr Y2, уз, ya) s otherwise. 


Let us compute the marginal PDF of X (2). We have 
82(y2) = 4! | SODI OD f (уз) f (ya) дуу дуз dya 
32 oo oo 
-4 f2) f. J | Fou) dya! fOafOvdys dy 
= y3 
-4 f(y) E if u- Fos1/034| fo) dy 


» [1— Е(у2)] 
= 4! f2) Í. ПОР. £01) dy 


F 
= 4! f (o е у є К. 


The procedure for computing the marginal PDF of Xç), the rth-order statistic of 
Ху, X2, ... , Xn, is similar. The following theorem summarizes the result. 


Theorem 3. The marginal PDF of Xr) is given by 


m n! r-i "T n-r 
(2) gr) = 6 -— D слу ry Fon] [1 — FQ" fOr), 


where F is the common DF of X1, X25, ... , Xn. 


Proof. 


yr Уг У2 со foo 
gr) = поо f bes dH. [ ]- ZH лода».  дук+\ 
yr Уг+1 Yn- 


lir 
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dy dy 


—F Эт” 
=n pop ерү] f Tuona 


—г)! 


[1— FO) (EFI! 


УРОН (r — D! 


, 


as asserted. 
We now compute the joint PDF of Ху) and Xœ, 1 <j < Е x n. 


Theorem 4. The joint PDF of X(;; and X(q) is given by 


An! : 
ee ply, 
"n" enu cm ORO 
ERORIS] LEOD = FODOS O) — ify;j«»e 
0 otherwise. 
(3) 
Proof. 


У} y2 Yk Yk со oo 
вною = f d |, zi Led AFONSO 
—00 -00 j к—2 У Yk Yn-1 


-dyn уку dyk—1 dyjadyi: dYj-1 


Уј y Ук 1—F n-k 
=" |. 5 |" E ronson roo 
yj Yk- 


- dyp—1 +- dyj+1 dy a 


_ _,U- Foor У [FQy) — FF! 
Tab дү 49 T (k-j-D! 
ТОТ -dyj-i 
= NOE. EE pu n—k a ayx-j-l 
"cool ДЕД КОЛІ 
[Foy 
s ОГО) or yj < Yk, 
as asserted. 
In a similar manner we can show that the joint PDF of Хүр), ..., Xi, 1 € ji < 


j<- < je <n, 1 <k <n, 15 given by 
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п\ 
ЕЙТ ур) f DIF (уз) 
— F(y) 1?! fy fh — Е(у)]" fy) 


8...) ОУ y... ‚ Yk) = 


for yj < y2 <.-. < yy, and = 0 otherwise. 


Example 4. Let X1, X2,... , X, be iid RVs with common PDF 


1 if0 <x <I, 
0 otherwise. 


а 


Then 


n! 


ЖЮК апты ш ere 
0 


otherwise. 


The joint distribution of Xj) and Хк) is given by 


n! j-i k—j-1 —k 
GST DU DE е ыы Ж ы 
gjiQ'j. Yk) = Оо<ур<у <1, 
0 otherwise, 


where 1 <j < <n. 
The joint PDF of Хү) and Хь) is given by 


вл, Yn) =n(n—-VDOn—y)"™?, OK<y <n <1 


and that of the range Rp = Хп) — X by 


nn—Dw'?(1—w) O<w<1, 
0, otherwise. 


gn, (w) = | 


Example 5. Let Хуу, Хо), Хз) be the order statistics of iid RVs X1, X2, X3 with 
common PDF 


Ве, х> 0 
0, otherwise 


fo | (В > 0). 
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Let Yı = X6) — Хо) and Yo = XQ). We show that Y, and Y are independent. The 
joint PDF of X 2) and Xg) is given by 


3! 
ea. у) = і 1010107 eR Bec OX peP, x<y, 
0, otherwise. 


The PDF of (Yj, Y2) is 


fony) = 318?(1 = e PX1ye- xi „—(у+уз)В 


_ [3! Be 7?» (1 — e £x (Be- 891), 0«yi«oo,0« уз < оо, 
e 0, otherwise. 


It follows that Y; and Y? are independent. 
Finally, we consider the moments: namely, the means, variances, and covariances 
of order statistics. Suppose that X1, X5, ... , X, are iid RVs with common DF F. 


Let g be a Borel function on R such that E[g(X)| < oo, where X has DF F. Then 
fori <r <n, 


r-i — n—r 
[Йй £06 nee nng c piu ©?! [1 — FG)" РО) dx 


< € i 0 f. lecülfG)dx O<F <1) 
r —1/ Jo 


< 00 


and we write 
со 
Ее(Х)) = [ 60g. G3 dy 
—©о 


forr = 1,2,... ‚п. The converse also holds. Suppose that Е |2(Х())| < oo for 
r=1,2,...,n. Then 


— 1 оо 
n(” Е JE lg) F7 aon — FQ)" f(x) dx < oo 


r 


forr — 1,2,...,n and hence 


oo n aa | 
* [336 - jen = АГ 


= nf IgG f (x) dx < oo. 


178 MULTIPLE RANDOM VARIABLES 


Moreover, it also follows that 


Y= Eg (Xe) = nEgQO. 


r=] 


As a consequence of the remarks above, we note that if E[g(X(:))| = oo for some ғ, 
1 <r <n, then E|g(X)| = oo, and conversely, if E|g(X)| = oo, then E|g| Xm) = 
со forsomer,1 <r <n. 


Example 6. Let X1, X2,... , Xn be iid with Pareto PDF f(x) = 1/x?, if x > 1, 
and = 0 otherwise. 
Then EX = oo. Nowforl xr <n, 


n=1)\ fo 1Y7 1 dx 
EX) = n(" _ JS x (1 — =) mm T 
— 1 1 
=n(" )/ уа yy dy. 
r—i 0 


Since the integral on the right side converges for 1 < r < n — 1 and diverges for 
r2n-—l,weseethat EX) = œ forr =n. 


PROBLEMS 4.7 


1. Let Xa), Хо), ... Хп) be the set of order statistics of independent RVs ХІ, X2, 
... , Xn With common PDF 


—xB А 
Же d if x > 0, 


otherwise. 


(a) Show that X(,) and Хү) — Хү) are independent for any s > r. 

(b) Find the PDF of Хот — Хо). 

(c) Let Zi = nXm, Z2 = (п — (Хо) — Xa) 23 = (п -Xo — 
Хоу), .... Zn = (Хо) — Xq-1). Show that (Zi, Z2, ... , Zn) and (X1, Xo, 
... › Xp) are identically distributed. 


2. Let X1, X2, ... , Xn be iid from PMF 


1 


—, k=1,2,..., N. 
N 


Pk = 


Find the marginal distributions of X(1), Хп), and their joint PMF. 
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3. Let Ху, X2,... , Xn be iid with a DF 


y? fO<y <1, 
0 otherwise, а > 0. 


r=] 


Show that X(j)/X(m), i = 1,2,...,n — 1, and X(n) are independent. 
4. Let X1, X2, ... , Xn be iid RVs with common Pareto DF f(x) = aa%/x*t!, 
x > o where a > 0, o > 0. Show that: 
(а) Xo) and (Хо) / Хо), ... , Хоп) / Хуу) are independent. 
(b) Xa has Pareto (о, по) distribution. 
(c) Dj- In(X(j)/ Хау) has PDF 


хп-2679х 


РО) = aor х > 0. 


5. Let Xj, X2, ... , X, be iid nonnegative RVs of the continuous type. If E|X| < 
оо, show that Е|Х(„у| < оо. Write M, = Xn) = max(X1, X2,... , Xn). Show 
that 


со 
EM, = ЕМ,_] +f Е"1(х) — F(x)] dx, n=2,3,.... 
0 


Find E M, in each of the following cases: 
(a) X; have the common DF 


F(x)-1—-e, x20 
(b) X; have the common DF 
F(x)—x, О<х<1. 


6. Let Ха), X(2),... » Хп) be the order statistics of n independent RVs X1, Хэ, 
... , Xn with common PDF f(x) = 1 if 0 < x < 1, and = 0 otherwise. Show 
that Yı = X(1)/X 2), Yo = X(9/X(3,..., Үп-1) = Хп 1) / Хи), and Y, = Хо 
аге independent. Find the PDFs of Y;, У2,... , Yn. 


7. For the PDF in Problem 4, find E Xç). 


8. An urn contains № identical marbles numbered 1 through N. From the um n 
marbles are drawn, and let X(n) be the largest number drawn. Show that P(X ny = 


k) = (А) етен... ‚ N, and EXq) = n(N +1)/(n + 1). 


п—1 


CHAPTER 5 
Some Special Distributions 


5.1 INTRODUCTION 


In preceding chapters we studied probability distributions in general. In this chapter 
we study some commonly occurring probability distributions and investigate their 
basic properties. The results of this chapter will be of considerable use in theoretical 
as well as practical applications. We begin with some discrete distributions in Sec- 
tion 5.2 and follow with some continuous models in Section 5.3. Section 5.4 deals 
with bivariate and multivariate normal distributions, and in Section 5.5 we discuss 
the exponential family of distributions. 


5.2 SOME DISCRETE DISTRIBUTIONS 


In this section we study some well-known univariate and multivariate discrete distri- 
butions and describe their important properties. 


5.2.1 Degenerate Distribution 


The simplest distribution is that of an RV X degenerate at point k, that is, P(X — 
k] = 1 and = 0 elsewhere. If we define 


(0) so 0 if x < 0, 
х) = 
1 if x > 0, 


the DF of ће RV X is e(x — k). Clearly, ЕХ! = КЇ,1 = 1,2,..., and M(t) = е“. 
In particular, var(X) = 0. This property characterizes a degenerate RV. As we shall 
see, the degenerate RV plays an important role in the study of limit theorems. 


5.2.2 Two-Point Distribution 


We say that an RV X has a two-point distribution if it takes two values, ху and хә, 
with probabilities 


Р{Х = хү} = р and Р{Х = х2) = 1— р, О<р < 1. 


180 
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We may write 
(2) X = x fp] xpo) 


where ГА is the indicator function of A. The DF of X is given by 


(3) F(x) = pe(x — x1) + (1 — p)e(x — хо). 
Also, 

(4) EX* = px +(i—p)xk, к= 1,2,..., 
and 

(5) M(t) = pe'* + (1 — p)e'? for all t. 

In particular, 

(6) ЕХ = px, t (1— p)x2 

and 

(7) var(X) = р(1 — p) — x2)". 


If x = 1, x2 = 0, we get the important Bernoulli RV: 
(8) P{X=1}=p and P{xX =0}=1-p, O<p<tl. 
For a Bernoulli RV X with parameter p, we write X ^ b(1, p) and have 
(9) ЕХ=р, var(X) = p(1—p), and M(t)=1+p(e'—1), all t. 


Bernoulli RVs occur in practice, for example, in coin-tossing experiments. Sup- 
pose that PH) = p,0 < р < 1, and P{T} = 1 — p. Define RV X so that X(H) = 1 
and X(T) = 0. Then P(X = 1} = p and P{X = 0} = 1 — p. Each repetition of 
the experiment will be called a trial. More generally, any nontrivial experiment can 
be dichotomized to yield a Bernoulli model. Let (Q, S, P) be the sample space of 
an experiment, and let A € 5 with P(A) = p > 0. Then P(A‘) = 1 — p. Each 
performance of the experiment is a Bernoulli trial. It will be convenient to call the 
occurrence of event А a success and the occurrence of А“ a failure. 


Example 1 (Sabharwal [95]). In a sequence of n Bernoulli trials with constant 
probability p of success ($), and 1 — p of failure (F), let Y, denote the number 
of times the combination SF occurs. To find EY, and var(Y,,), let X; represent the 
event that occurs on the ith trial, and define RVs 


Lo ИХ;=5,Х=ЕЁ 


Xj, Xj = 
f Kir) 0 otherwise 


(1 = 1,2,...,п ~ 1). 
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Then 
п—1 
Y, = УРО, Хад) 
i=] 
and 
EY, = (n – 1)р(1 — р). 
А10, 
п—1 
EY; = Е 5 f, xo] +E [> y fons Xi F (Kj, xn] 
iz i£j 
= (и – 1)р(1 - р) + (-2)-3p*- py, 
so that 
уаг(У„) = р(1— pln — 1+ p(1 — p)G — 3л)]. 
If p = 4, then 
п—1 nal 
EY, = SA and var(Y,) = е 
5.2.3 Uniform Distribution on n Points 
X is said to have a uniform distribution on n points [x1, x2, ... , Xn} if its PMF is of 
the form 
1 
(10) P{X =xj}=-, iz1,2,...,.R. 
n 


Thus we may write 


n 1 n 
X =} xHx-.j and К(х)= т se — xi), 


i=l i=l 


1 n 
(11) EX = = у, 
i=l 
1 п 
(12) ЕХ!=-ух, 1=1,2,... 
n i=l 
and 


n 2 n 
(13) var(X) = D (СУ) = ~ -ay 
i-1 


i=l 
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if we write x = У ., xi/n. Also, 





(14) ма) = for all t. 

If, in particular, x; = i, i = 1,2,...,n, 

(15) ЕХ = 727, Ех? = 1700110), 
апа 

(16) var(X) = а 


Example 2. А box contains tickets numbered 1 to №. Let X be the largest number 
drawn in n random drawings with replacement. 
Then P(X < k} = (k/N)", so that 
P(X = к} = Р{Х < к} —P(X xk—1) 
| {k л к-1\" 
(An N | 
Also, 


N 
EX = М" ук! E (k ERN "+! 5 (К d 1)"] 
1 


N 
= №" a = Ya 2; | . 
1 


5.2.4 Binomial Distribution 


We say that X has a binomial distribution with parameter p if its PMF is given by 
(17) py = P{X =k} = (rta —py-* kz0,L2,...,.m 0< psi. 


Since Уку Pk = [p--(1— p)]" = 1, the p,’s indeed define a PMF. If X has PMF 
(17), we will write X ~ b(n, p). This is consistent with the notation for a Bernoulli 
RV. We have 


F(x) = у () p*( — р)" *e@ — k). 


к=0 
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In Example 3.2.5 we showed that 


(18) EX =np, 

(19) EX? =n(n—1)p* +np, 
and 

(20) var(X) = пр(1 — p) = npq, 


where а = 1 — p. Also, 


(21) M(t) = Ye pra = py" 
k=0 k 


= (9 + ре)” ога. 
The PGF of X ~ b(n, р) is given by P(s) = {1 — р(1 — 5))", |s| < 1. 
Binomial distribution can also be considered as the distribution of the sum of n 
independent, identically distributed b(1, p) random variables. If we toss a coin, with 


constant probability р of heads and 1 — р of tails, п times, the distribution of the 
number of heads is given by (17). Alternatively, if we write 


1 if kth toss results in a head, 
Хк = . 
0 otherwise, 


the number of heads in z trials is the sum $, = X1 + X2 +---+ Xn. Also 
P(X,—1)—p and P(Xy—0]—-1—p, k=1,2,...,n. 


Thus 


n 
ES, = $ ЕХ; = пр, 
] 


var(Sn) = У ` var(X;) = np(1 — р), 
1 
апа 


M(t) = П Ee!*i 


i=l 
= (9 + ре)". 


Theorem 1. Let X;(é = 1,2,... , К) be independent RVs with X; ~ b(ni, p). 
Then 5; = Yt Xj hasa b(n, + n2+---+ ng, p) distribution. 
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Corollary. If X;(i = 1,2,... , К) are iid RVs with common PMF b(n, p), then 
Sg has a b(nk, p) distribution. 


Actually, the additive property described in Theorem 1 characterizes the binomial 
distribution in the following sense. Let X and Y be two independent, nonnegative, 
finite integer-valued RVs and let Z = X+Y. Then 2 isa binomial RV with parameter 
p if and only if X and Y are binomial RVs with the same parameter p. The “only if” 
part is due to Shanbhag and Basawa [101] and will not be proved here. 


Example 3. A fair die is rolled n times. The probability of obtaining exactly one 
6 is n(D(GY"-!, the probability of obtaining no 6 is (2)", and the probability of 
obtaining at least one 6 is 1 — ( D". 

The number of trials needed for the probability of at least one 6 to be > 1 is given 
by the smallest integer п such that 


so that 





Example 4. Here г balls are distributed in n cells so that each of n" possible 
arrangements has probability n". We are interested in the probability pg that a 
specified cell has exactly k balls (k = 0, 1, 2, ... , ғ). Then the distribution of each 
ball may be considered as a trial. A success results if the ball goes to the specified 
cell (with probability 1/n); otherwise, the trial results in a failure (with probability 
1 — 1/n). Let X denote the number of successes in r trials. Then 


1 k 1 r—k 
p= rx -u- (7) (5) (1-5) ОБ г 
k n n 


5.2.5 Negative Binomial Distribution (Pascal or Waiting-Time Distribution) 


Let (©, S, P) be a probability space of a given statistical experiment, and let A € S 
with P(A) — p. On any performance of the experiment, if A happens we call it a 
success, otherwise a failure. Consider a succession of trials of this experiment, and 
let us compute the probability of observing exactly r successes, where r > lisa 
fixed integer. If X denotes the number of failures that precede the rth success, X +r 
is the total number of replications needed to produce r successes. This will happen 
if and only if the last trial results in a success and among the previous (r + X — 1) 
trials there are exactly X failures. It follows by independence that 


(22) Pix a) - È "oae х=0,1,2,.... 
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Rewriting (22) in the form 


Q3 P{X=x}= (Sera, x=0,1,2,...; а=1-р, 
we see that 
oO И 
Q4) у: ( Joo =(1-4)” =p”. 
x=0 x 
It follows that 
oo 
L Р{Х =х} = 1. 
=0 


Definition 1. For a fixed positive integer r > 1 and 0 < p < 1, an RV with PMF 
given by (22) is said to have a negative binomial distribution. We use the notation 
X ~ N B(r; p) to denote that X has a negative binomial distribution. 


We may write 
= 2L ҮК+т—1 
X=) хЦх=у and F(x) = 5 ( И )га — pY'e(x — k). 
x=0 k=0 
For the MGF of X we have 
c -1 
(25) M(t) = у ( iii )e'a = pye" 
x=0 * 
z xtr-1 
=p Y«ev( ) (@=1—р) 
x=0 ы 


-p(-qe)y" for qe! « 1. 
The PGF is given by P(s) = p'(1 — sq)”, |s| < 1. Also, 


d — 1 
(26) кх=ў{*+, )r« 
х=0 


e xr 
ғр" У ( e" 
x=0 x 


экы rq 
= 1р'901 —4) "| = —. 
р 


ii 


Similarly, we can show that 


rq 
(27) var(X) = —. 
р 
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If, however, we are interested in the distribution of the number of trials required 
to get r successes, we have, writing Y = X +r, 


-1 
(28) у= (1 ira- y=rnrt+l,..., 


EY =ЕХ+г=—, 
p 


(29) rq 
var(Y) — var(X) — > 
Р 
апа 
(30) Мү(ї) = (ре!) (1 — аде)" for qe! < 1. 


Let X be a b(n, p) RV, and let Y be the RV defined in (28). If there are r or more 
successes in the first n trials, at most n trials were required to obtain the first r of 
these successes. We have 


(31) Р{Х > ғ} = P(Y <n} 
and also 
(32) P{X <r} = P{Y > п}. 


In the special case when r = 1, the distribution of X in (22) is given by 
(33) P{X =x} = pq’, x=0,1,2,.... 


An RV X with PMF (33) is said to have a geometric distribution. Clearly, for the 
geometric distribution, we have 


(34) M()-pü-qeé), ЕХ = ©, and var(x) = 4. 
Р p 


Example 5 (Banach's Matchbox Problem). A mathematician carries one 
matchbox each in his right and left pockets. When he wants a match, he selects 
the left pocket with probability p and the right pocket with probability 1 — p. Sup- 
pose that initially each box contains N matches. Consider the moment when the 
mathematician discovers that a box is empty. At that time the other box may contain 
0, 1,2... , N matches. Let us identify success with the choice of the left pocket. 
The left-pocket box will be empty at the moment when the right-pocket box contains 
exactly r matches if and only if exactly N — ғ failures precede the (N + 1)st success. 
A similar argument applies to the right pocket, and we have 


pr — probability that the mathematician discovers a box empty while 
the other contains r matches 


_ [23 -TY war wer, [2N TY na м 
(у, )Р К Е E 
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Example 6. A fair die is rolled repeatedly. Let us compute the probability of event 
A that a 2 will show up before a 5. Let A; be the event that a 2 shows up on the jth 
trial (j = 1,2,...) for the first time, and a 5 does not show up on the previous j — 1 
trials. Then PA = 277? PAj, where PA; = 2(3)/~'. It follows that 


ee pai 4 

Р(А) = -{- = >. 

(9 У 6 (2) 2 
j=l 

Similarly, the probability that a 2 will show up before a 5 or a 6 is d and so on. 


Theorem 2. Let X1, X2, ... , Xy be independent N B(ri; p) RVs, i = 1,2,...,k, 
respectively. Then 5; = уы Xi is distributed as N B(ri + r2 4- --- - ry; p). 


Corollary. If Х|, X2, ... , Xy are iid geometric RVs, then S; is an N B(k; p) RV. 


Theorem 3. Let X and Y be independent RVs with PMFs N B(ri; p) and 
N B(r»; p), respectively. Then the conditional PMF of X, given X + Y = t, is 


expressed by 
xtrn—1 „её 
x t—x 


S [m 
: 


If, in particular, rj = r2 = 1, the conditional distribution is uniform on t + 1 points. 


Proof. By Theorem 2, X + Y is an N B(ri + r2; p) RV. Thus 


PIX = х, Y =: x) 
P(X +Y = 1} 


— 1 t~ — 1 
(aa )ra -»( x+n )r^a — py^* 
x t—x 


-1 
( tn xt Jara — py 


[ee 
uoo d IS uu ue. peo. 
желе) 
1 


If r4 = m = 1, that is, if X and Y are independent geometric RVs, then 


|| 


P(X =х\Х+Ү = 1} 


| — 


35)Р{Х =x|X +Y =} =- х =0,1,2,...; =0,1,2,.... 


1’ 


+ 
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Theorem 4 (Chatterji [12]). Let X and Y be iid RVs, and let 
P{X =k) = рк > 0, k=0,1,2,.... 


If 


|- 
~ 
IV 
е 


Q6 P{X =t|X+¥ 21) = PIX =1-1X+¥ == 


- 


t+ 
then X and Y are geometric RVs. 


Proof. We have 


Pt Po 1 
(37) Р{Х = tX +Y =t} = Н = — 
Y i-o Pkpi-k ЁЪЛ 
and 
P 1 
(38) P(X =1-1|X+¥ =} = EPI -— 
Yu-oPkPr-k t+ 


It follows that 
Pt = PL 
Pt-1 PO 


and by iteration p, = (p1/ po) po. Since 77-9 p: = 1, we must have pi/po < 1. 
Moreover, 


1 
ОТ (pi/ po) 


so that pı/ po = 1 — ро, and the proof is complete. 


1= р 


Theorem 5. If Х has a geometric distribution, then for any two nonnegative in- 
tegers m and n, 


(39) P(X >m+n|X > m) = P{X > п). 
The proof is left as an exercise. 


Remark 1. Theorem 5 says that the geometric distribution has no memory; that 
is, the information of no successes in m trials is forgotten in subsequent calculations. 


The converse of Theorem 5 is also true. 


Theorem 6. Let X be a nonnegative integer-valued RV satisfying 


P(X >m+1|X > т} = P{X > 1}. 
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for any nonnegative integer m. Then X must have a geometric distribution. 


Proof. Let the PMF of X be written as 


Р{Х=Кк}=р,  К=0,1,2,.... 


Then 
oo 
РАХ >п} = у pr 
k=n 
and 
oo 
Р{Х > т} = у, Рк = т, Say, 
т+1 
Р{Х 1 
Р{Х >т} Qm 
Thus 
#т+1 = 9т90, 
where qo = P{X > 0} = р + p2 +--- = 1 — po. It follows that qg = (1 — po)*t!, 
and hence pk = qx-1 — qk = (1— Do)" po, as asserted. 
Theorem 7. Let X1, X2, ... , X, be independent geometric RVs with parameters 
Pi, P2,--- , Pn, respectively. Then Хт) = min(X1, X2,... , Xn) is also a geometric 
RV with parameter 


п 
p-i-[[a-»». 
i=] 
The proof is left as an exercise. 


Corollary. lid RVs Xi, X2,... , Xn are N B(1; p) if and only if Хүр) is a geo- 
metric RV with parameter 1 — (1 — р)”. 


Proof. The necessity follows from Theorem 7. For the sufficiency part of the 
proof, let 


Р{Хау xk) = 1— P{Xa > К} = 1—(1— р)" +0. 
But 


Р{Хау <) 21 — P(Xi >k, X2 > К,..., Xn > К} 
=1-[1~ FG. 
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where F is the common DF of X1, X2,... , Xn. It follows that 
1— Е(Ю = (1 – р), 


so that P{X; > К] = (1 — p)**!, which completes the proof. 


5.2.6 Hypergeometric Distribution 


A box contains N marbles. Of these, M are drawn at random, marked, and returned 
to the box. The contents of the box are then thoroughly mixed. Next, n marbles are 
drawn at random from the box, and the marked marbles are counted. If X denotes 
the number of marked marbles, then 


п 7M\ [N — 
e 0 mee () (zn 


Since x cannot exceed M or n, we must have 
(41) x x min(M, n). 
Also, x > бапа N — М > п — x, so that 


(42) x > max(0, M +n — №). 


206 59-077 


for arbitrary numbers a, b and positive integer n. It follows that 


rores) EC es 


Definition 2. An RV X with PMF given by (40) is called a hypergeometric RV. 


Note that 


It is easy to check that 


n 
(43) EX = —M. 
2 MM-) nM 
(44) ЕХ? = nn 1) + 
and 
(45) aoa c MU y 


N*(N — 1) 
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Example 7. A lot consisting of 50 bulbs is inspected by taking at random 10 
bulbs and testing them. If the number of defective bulbs is at most 1, the lot is ac- 
cepted; otherwise, it is rejected. If there are, in fact, 10 defective bulbs in the lot, the 
probability of accepting the lot is 


COS) , Go) 
1 9 10 
SN t 750\ = 3487 
10 10 
Example 8. Suppose that an urn contains b white and c black balls, b + с = N. 
A ball is drawn at random, and before drawing the next ball, s + 1 balls of the same 
color are added to the urn. The procedure is repeated n times. Let X be the number 


of white balls drawn in n draws, X = 0, 1,2,... ‚п. We shall find the PMF of X. 
First note that the probability of drawing k white balls in successive draws is 


b b+s b+2s Ь+ (К —– 1)5 


ММ+5№ +25 М+(К—1)5” 


and the probability of drawing k white balls іп the first k draws and Шеп n — k black 
balls in the next n — k draws is 


(46) лш S СВА Cun! M nce SO ЖЕ 
NN+s N+(k-DsN+ksN+(k -- Ds 

с+ (п —К— 1) 

"N+@—Ds | 


Here ру also gives the probability of drawing k white and n — k black balls in any 
given order. It follows that 


(47) P{X=k}= (boe 


An RV X with PMF given by (47) is said to have a Polya distribution. Let us write 
Np=b, М(1—р)=с, and № =. 
Then with q = 1 — p, we have 


PIX =k) = (mmm S 


k 101 +a)--- [1 + (и — 00] 


Let us take 5 = —1. This means that the ball drawn at each draw is not replaced in 
the urn before drawing the next ball. In this case о = —1/N, and we have 
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gy (*\ NPP - 0 [Np - ( - Diele- D --- [c7 @ - k - 0] 
pix =k) - (7) N(N — D---[N (п — 1)] 


GG) 


(48) = (бу 
п 
which is a hypergeometric distribution. Неге 
(49) max(0, n — №) € k < min(n, Np). 


Theorem 8. Let X and Y be independent RVs with PMFs b(m, p) and b(n, p), 
respectively. Then the conditional distribution of X, given X + Y, is hypergeometric. 


5.2.7 Negative Hypergeometric Distribution 


Consider the model of Section 5.2.6. A box contains N marbles; M of these are 
marked (or say defective) and N — M are unmarked. A sample of size n is taken, and 
let X denote the number of defective marbles in the sample. If the sample is drawn 
without replacement, we saw that X has a hypergeometric distribution with PMF 
(40). If, on the other hand, the sample is drawn with replacement, then X ~ b(n, p) 
where p — M/N. ў 

Let Y denote the number of draws needed to draw the rth defective marble. If 
the draws are made with replacement, then Y has the negative binomial distribution 
given in (22) with p = M/N. What if the draws are made without replacement? In 
that case in order that the kth draw (k > r) be the rth defective marble drawn, the 
kth draw must produce a defective marble, whereas the previous k — 1 draws must 
produce r — 1 defectives. It follows that 


E ieee 


= “N—k+1 
kw 


fork = r,r + 1,... , №. Rewriting, we see that 


P(Y =k) = 


o 
(50) P(Y = Ю = C И JR 
(и) 


Ап КУ У with РМЕ (50) is said to have a negative hypergeometric distribution. 
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It is easy to see that 


N+1 EYY +1)= PETON DWN ES): 


ЕҮ = i 
ТМ +1 (М + 1)(M +2) 





апа 


r(N — MXN t DOM +1 р) 
(M + 1)2(М +2) 


Also, i£ r/ N — O and k/N — 0 as N — оо, then 


кє оше Gaye 


which is (22). 


var(Y) = 


52.8 Poisson Distribution 
Definition 3. An RV X is said to be a Poisson RV with parameter A > 0 if its 
PMF is given by 


e ^X 
kt? 





(51) P{X = к} = к= 0,1,2,.... 


We first check to see that (51) indeed defines a РМЕ We have 


yr = deed oe = 1. 


If X has the PMF given by (51), we will write Х ~ P(A). Clearly, 


oo 
Х = ua 
k=0 


and 

Е(х) = Lee — k). 
The mean and the variance are given by (see Problem 3.2.9) 
(52) EX =), ЕХ? =А+А?, 


апі 


(53) уаг(Х) = А. 
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The MGF of X is given by (see Example 3.3.7) 
(54) Ee'* = exp[A(e' — 1)] 
and ће PGF by P(s) = e^0—-5, |s| < 1. 


Theorem 9. Let X1, X2,... , Xn be independent Poisson RVs with X, ~ P (àx), 
k = 1,2,...,n. Then Sn = Xj + X2 +--+ X4 isa РОл + A2 +--+: A4) RV. 


The converse of Theorem 9 is also true. Indeed, Raikov [82] showed that if 
X1, X2, ... , Xn are independent and 5, = У? ., X; has a Poisson distribution, each 
of the RVs Xj, X2, ... , Xn has a Poisson distribution. 


Example 9. The number of female insects in a given region follows a Poisson 
distribution with mean A. The number of eggs laid by each insect is a Р(д) RV. We 
are interested in the probability distribution of the number of eggs in the region. 

Let F be the number of female insects in the given region. Then 


e ^f 
dios c f20,12,. 
Let Y be the number of eggs laid by each insect. Then 


P{Y = y, Е = fj = Р(Е = РР(Ү = ylF = f] 
eat (fup e”! 





ep оо (Ает ^) y 
нЕ yy Ge 
yt. m 
The MGF of Y is given by 


00 Afg-^ 00 e*t y 
Mi) = Уе y EET uar 
fp б уш nM 





= expLf u(e — 1)] 


= ш, 


=e one Dy, 
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Theorem 10. Let X and Y be independent RVs with PMFs P(A1) and P(A2), 
respectively. Then the conditional distribution of X, given X 4- Y, is binomial. 


Proof. For nonnegative integers m and n, m « n, we have 
P{X =m, Y —n-m) 
P(X +Y =n} 
e M (AT /m)e ? Q57" /(n — m)!) 
e-Oi 324 + Аз)" n! 


= (7) "Ху" 
mj Ол + А)" 


"dU тоо а 
Am) ХА t À2 Ay c À2 , 


m=0,1,2,...,n, 


P(X =m|X +Y = п) 








and the proof is complete. 


Remark 2. The converse of this result is also true in the following sense. If X 
and Y are independent nonnegative integer-valued RVs such that P{X = k} > 0, 
P{Y = k} > 0, fork —0,1,2,... , and the conditional distribution of X, given 
X + Y, is binomial, both X and Y are Poisson. This result is due to Chatterji [12]. 
For the proof, see Problem 13. 


Theorem 11. If X ~ P(A) and the conditional distribution of Y, given X = x, is 
b(x, p), then Y is a P(Ap) RV. 


Example 10 (Lamperti and Kruskal [58]). Let N be a nonnegative integer-valued 
RV. Independent of each other, N balls are placed either in urn A with probability p 
(0 <р < 1) огіп urn B with probability 1 — p, resulting in NA balls in urn A and 
Ng = N — NA balls in urn B. We will show that the RVs N4 and Ng are independent 
if and only if N has a Poisson distribution. We have 


b 
P(NA4 = a and Ng =bIN eat = (^7 )r'a - »*. 
where a, b are integers > 0. Thus 
a+b a.b 
PÍNA =a, Np =} = Р 4 P{N =n}, g=1-p, п= а +b. 
а 


If N has a Poisson (A) distribution, then 


(a+b)! , je haste 


Р{МА =a, Np =} = ———— 
LU C терер ср 
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2 (5) (бм) 
а! b! ] 


so that N4 and Ng are independent. 
Conversely, if N4 and Ng are independent, then 


P(N = п)п! = f(a)g(b) 


for some functions f and g. Clearly, f (0) 4 0, g(0) # 0 because P(NA = 0, Ng = 
0} > 0. Thus there is a function h such that h(a +b) = f (a)g(b) for all nonnegative 
integers a, b. It follows that 

h(1) = f(1)g(0) = /(0)е(1), 

h(2) = f (2)g(0) = f(Dg(1) = f(0gQ). 


and so on. By induction, 


g(1) op En 
= fa) |222 b) = p(t) | = Я 
f(a) = ft | | з g(b) = g(t) FO 


We may write, for some о, a2, А, 
(а) = оце, g(b) = aye, 


and 
e (a+b) 
P{N=n}= оо ру, 
so that № is a Poisson RV. 


5.29 Multinomial Distribution 


The binomial distribution is generalized in the following natural fashion. Suppose 
that an experiment is repeated п times. Each replication of the experiment terminates 


in one of k mutually exclusive and exhaustive events A1, A2,... , Ак. Let pj be the 
probability that the experiment terminates in Aj, j — 1,2,... , k, and suppose that 
Dj (7 = 1,2,... , k) remains constant for all п replications. We assume that the n 
replications are independent. 

Let x1, x2, ... , xy 1 be nonnegative integers such that xy +x2+---+x4_1 € n. 
Then the probability that exactly x; trials terminate in Aj, i = 1,2,...,k — 1, and 


hence that xy = n — (xy + x2 +--+ + xx 1) trials terminate in A, is clearly 


n! x1 X 


PL P2 


PN eR ep 
xj! x2! --- kc 


198 SOME SPECIAL DISTRIBUTIONS 


If (X1, X2,..., Xy) is a random vector such that X; = x; means that event A; has 
occurred x; times, x; = 0, 1, 2,... , n, the joint PMF of (X1, X2,... , Xx) is given 
by 


(55) P[X| = xi, X2 = x»,..., Xk = xk} 
n! Жү. X2 Xk . __ k 
- PENNE LI Рә» Pk ifn = Ух, 
0 otherwise. 


Definition 4. An RV (Х|, X2,... , Хк) with joint PMF given by 


(56 Р{Х = х, X2 = х2,..., Xy-1 = х1} 


n! А xe 
1x2 n—Xx|—...—Xk-] 
——————————————ppy ...p 
xilx2!... (n= x1 —-—x 1 ^? k 
E if x1 +x2+--- + xi SN, 
0 otherwise, 


is said to have a multinomial distribution. 
For the МСЕ of (Xi, X2, ... , Хер) we have 


(57) M(t, е ое е Ee n eee 


n 15 49 хк 
2 ghisa P РІ Рэ... Ру 


xi! x2! x! 


XpX2,... Xk- 120 
Xy+x2+...XK-1 mn 


n 


(pie!) (pre)? ,. , 


Ш 


! x5! ! 
рухо X10 ХБХ... 
Xxptx2...Xk—1 Sn 


> (px—1e*-!)**-! р“ 
= (рүе! + pre? + +--+ р-де! + py)" 


for all t1, to, ... , te-1 € R. 
Clearly, 
M(t, 0,0,... ,0) = (pre + po +++: p" = (1 — ру + pie^)", 


which is binomial. Indeed, the marginal PMF of each Х;, і = 1,2,...,k — 1, is 
binomial. Similarly, the joint МСЕ of Xi, Xj, i, j —1,2,....k— 1 z J), is 


M(0,0,... ,0,1;,0,...,0,t;,0,... ,0) = [pie" + pje +1 — pi — ppl", 


which is the MGF of a trinomial distribution with PMF 
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! И А : 
n: Xj n—Xj—Xj 
$ 


EPEE ee ee, | = =A е 
(58) Oe Gael Pj Pk Pr =1— pi — pj. 


Note that the RVs X1, X2,... , Xy. are dependent. 
From the MGF of (X1, X2, ... , Хр) or directly from the marginal PMFs we 
can compute the moments. Thus 


(59) EX;=np; апа var(X;) — npj(1— pj), j=1,2,...,k-1, 
and for j = 1,2,...,k —1, andi Æ j, 
(60) cov(X;, Xj) = E[(X; — npi)(Xj —npj)] = —npipj. 
It follows that the correlation coefficient between X; and X ; is given by 
1/2 
(61) n=- [e] i,j=1,2,...,k-1 (Æj) 
Example 11. Consider the trinomial distribution with PMF 


n! _ 
PI хуу =a th OY, 
f $i y} x!yl (n — x — уу! 1723 


where x, y are nonnegative integers such that x + y < n, and pi, p2, p3 > О with 


Pi + pz + рз = 1. The marginal PMF of X is given by 


P{X = х} = (sta py", x-9,L2...,n. 


It follows that 
P{Y = у|Х =x} 


(n — x)! p2 p аи t 

| ——— fy 20,1,2,...,n—x, 

= } у! (п х – у)! 1— ру Х1- р um id 
0 otherwise, 


(62) 


which is b(n — x, p2/(1 — p1)). Thus 





(63) E(Ylx) = (п — х) 22. 
1 р 

Similarly, 

(64) Е{Х|у} = (п = у) 





1— p 
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Finally, we note that if X = (X4, X2,... , Xy) and Y = (Y1, Y2,... , Yg) are 
two independent multinomial RVs with common parameter (pi, рә, ... , pk), then 
Z = X + Y is also a multinomial RV with probabilities (pi, p2, ... , px). This fol- 
lows easily if one employs the MGF technique, using (58). Actually, this property 
characterizes the multinomial distribution. If X and Y are k-dimensional, nonnega- 
tive, independent random vectors, and if Z == X + Y is a multinomial random vector 
with parameter (pi, pz, ... , рк), then X and Y also have multinomial distribution 
with the same parameter. This result is due to Shanbhag and Basawa [101] and will 
not be proved here. 


5.2.10 Multivariate Hypergeometric Distribution 


Consider an urn containing N items divided into k categories containing n1, n2, ... , 
ny items, respectively, where Уа nj = М. А random sample, without replace- 
ment, of size n is taken from the urn. Let X; — number of items in sample of type i. 
Then 


kin; N 
(65) Р{Ху = x1, X2 = х2,..., Xk xk) = | | ИСР 
j 


j=l 
where xj = 0,1,..., min(n, nj), and У уху =n. 
We say that (X1, X2,... , Хк.) has multivariate hypergeometric distribution if 


its joint PMF is given by (65). It is clear that each X ; has a marginal hypergeometric 
distribution. Moreover, the conditional distributions are also hypergeometric. Thus 


ni\ {N —ni — xj 
pus 25) 
pe 


niV/N —ni —nj—ne 
CIG —Xi — Xj =) 
n—Xj—Xe 
and so on. It is therefore easy to write down the marginal and conditional means and 
variances. We leave the reader to show that 


Р{Х; = x; |X; = xj} = 
and 


P{X; = xilXj = xj, Хе = хе} = 


EX;= be 
1 TN 


nj N —n; N—n 
n N N-i 





var(X;) =n 


, 


SOME DISCRETE DISTRIBUTIONS 201 


and 





N-n (nj? 
cov(Xi, Xj) = ти (52). 
5.2.11 Multivariate Negative Binomial Distribution 


Consider the setup of Section 5.2.9, where each replication of an experiment ter- 
minates in one of k mutually exclusive and exhaustive events А, A2, ... , Ay. Let 
pj = P(Ajy), j = 1,2,... ‚ К. Suppose that the experiment is repeated until event 
Ак is observed for the rth time, r > 1. Then 


(66) Р(Х = x1, X2 = х2,..., Xk =r) 
(bx Р Е UCET у, 
e m nis 
( ху!) e - D! j=l 


for = 0,1,2,... (= 1,2,...,k-D,1 <r <00,0< pi < 1, E pi <1, 
and рк = 1— У ру. 


We say that (X;, X2, ... , Xy 1) has a multivariate negative binomial (or nega- 
tive multinomial) distribution if its joint PMF is given by (66). 
It is easy to see that the marginal PMF of any subset of (X1, X2,... , Xy 1] is 


negative multinomial. In particular, each X ; has a negative binomial distribution. 
We will leave the reader to show that 


kl. k-i zr 
(67) Mi. 52, ... 5-1) = Ее 21900 = рү ( - Yun) 


j=) 
and 
(68) cov(X;, Xj) = РЇ. 
Pk 
PROBLEMS 5.2 


1. (a) Let us write 


b(k;n, p) = (ra Spy  k=0,1,2,...,n. 


Show that as k goes from 0 to n, b(k; n, p) first increases monotonically and 
then decreases monotonically. The greatest value is assumed when k — m, 
where m is an integer such that 
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(п+1)р—1<т<(з+1)р 


except that b(m — 1; n, p) = b(m; n, p) when m = (n + 1)p. 
(b) If k > np, then 


(k + (1 — p) 
P{X > k} x b(kin, р) .————————; 
USA SUID TS GET 
and if k « np, then 
(n—k-+1)p 


P{X < k} € b(k; п, p ——————. 
(X Sk < bm учтуу: 
2. Generalize the result in Theorem 10 to n independent Poisson RVs; that is, if 
X1, X2, ... , Xn are independent RVs with X; ^ P(A;),i = 1,2,...,n, the 
conditional distribution of X1, X2, ... , Xn, given у ad Xj = 1, is multinomial 
with parameters t, А1/ УА, ... Àn/ УА. 


3. Let X1, X2 be independent RVs with X; ~ b(nj, $), i = 1,2. What is the PMF 
of Ху — Хә + n2? 


4. A box contains N identical balls numbered 1 through N. Of these balls, n are 
drawn at a time. Let X1, X2, ... , Xn denote the numbers on the n balls drawn. 
Let $, — Kam X;. Find var(S$, ). 


5. From a box containing N identical balls marked 1 through N, M balls are 
drawn one after another without replacement. Let X; denote the number on 
the ith ball drawn, i = 1,2,...,M, < M < N. Let Y = max(X, X2, 
... , Xm). Find the DF and the PMF of Y. Also find the conditional distribution 
of X1, X2,... , Хм, given Y = y. Find EY and var(Y). 


6. Let f (xir, р), x «0, ,2,... , denote the PMF of an N B(r; p) RV. Show that 
the terms f (x; ғ, p) first increase monotonically and then decrease monotoni- 
cally. When is the greatest value assumed? 


7. Show that the terms 
nk 
PAX = k} =e*—, k=0,1,2,..., 
k! 
of the Poisson PMF reach their maxima when k is the largest integer < A and at 
(A — 1) and A if A is an integer. 


8. Show that 


k 
n\ ү uc А 
(2) СР руме H 


as n — oo and p — 0, so that np = А remains constant. (Hint: Use Stirling's 
approximation, namely, n! ~ 4/27 n^ *V267^ as п —> оо.) 
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9. 


10. 


11. 


12. 


13. 


14 
15. 


16. 


A biased coin is tossed indefinitely. Let p (0 < p < 1) be the probability of 
success (heads). Let Y; denote the length of the first run and Y? be the length of 
the second run. Find the PMFs of Y; and Y2, and show that EY; = q/p + p/q, 
EY» = 2. If Y, denotes the length of the nth run, л > 1, what is the PMF of Y,? 
Find EY,. 


Show that 


NN /NpV (NG — p) n\ k n—k 
(n) Gall n-k )^ (9° SOM 


as N — oo. 
Show that 


r+k—-1\, EE 
( k )ra-» — e ri 


as p — landr — оо in such а way that r(1 — p) = А remains fixed. 


Let X and Y be independent geometric RVs. Show that min(X, Y) and X — Y 
are independent. 


Let X and Y be independent RVs with PMFs Р(Х = k} = py, P(Y =k} = дк, 
k == 0,1,2,... , where py, дк > O and Уусу Pk = У gk = 1. Let 


P{X=k|X+Y=t}= (ste — о), O<k<t. 


Then a, = с for all г, and 


Е e 98 (өр) a, e ?0* 
у Тае and qk = TT 
where В = a/(1 — о), and Ө. > 0 is arbitrary. (Chatterji [12]) 
Generalize the result of Example 10 to the case of k urns, k > 3. 


Let (X1, X2, ... , Хк) have a multinomial distribution with parameters и, 
pi, P2,--+ » ki. Write 


cs y: (Xi — пр)? 
d пр; i 


where py = 1 — pı —- --— py i, and X, = n — X1 —- - -— Хь. Find EY and 
var(Y). 


Let X1, X? be iid RVs with common DF F, having positive mass at 0, 1, 2,.... 
Also, let U = max(X;, X2) and V = X; — X2. Then 
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P{U = j, V = 0} = P{U = j}P{V = 0) 


for all j if and only if F is a geometric distribution. (Srivastava [107]) 


17. Let X and Y be mutually independent RVs, taking nonnegative integer values. 
Then 


PIX xn] — PIX +Y zn] = aP(X +Y =n} 
holds for n = 0, 1, 2,... and some a > O if and only if 


1 a M 
P{Y =n} = — | ——-] , =O; 15:25 
nj s az) E 2 


(Hint: Use Problem 3.3.8.) (Puri [81]) 


18. Let X, X2,... be a sequence of independent b(1, р) RVs with 0 < p < 1. 
Also, let Zy = pot Xi, where N is a P(A) RV that is independent of the X;'s. 
Show that Zy and № — Zy are independent. 


19. Prove Theorems 5, 7, 8, and 11. 


5.3 SOME CONTINUOUS DISTRIBUTIONS 


In this section we study some. most frequently used absolutely continuous distribu- 
tions and describe their important properties. Before we introduce specific distribu- 
tions it should be remarked that associated with each PDF f there is an index or a 
parameter 0 (may be multidimensional) which takes values in an index set O. For 
any particular choice of Ó є © we obtain a specific PDF fe from the family of PDFs 
(fo, 0 € ©}. 

Let X be an RV with PDF %(х), where Ө is a real-valued parameter. We say that 
0 is a location parameter and ( fo) is a location family if X —0 has PDF f (x) which 
does not depend on 0. The parameter 0 is said to be a scale parameter and ( fo) is a 
scale family of PDFs if X/0 has PDF f (x) which is free of 0. If = (и, с) is two- 
dimensional, we say that 0 is a location-scale parameter if the PDF of (X — 1)/a is 
free of u and c. In that case, { fg} is known as a location-scale family. 

It is easily seen that 0 is a location parameter if and only if fo(x) = f(x — Ө), 
a scale parameter if and only /о(х) = (1/0) f (x), and a location-scale parameter if 
fo(x) = (1/o) f ((x — u)/a), o > 0 for some PDF f. The density f is called the 
standard PDF for the family { јо, 0 € Ө}. 

A location parameter simply relocates or shifts the graph of PDF f without chang- 
ing its shape. A scale parameter stretches (if 0 > 1) or contracts (if 0 < 1) the graph 
of f. A location-scale parameter, on the other hand, stretches or contracts the graph 
of f with the scale parameter and then shifts the graph to locate at и (see Fig. 1). 

Some PDFs also have a shape parameter. Changing its value alters the shape of 
the graph. For the Poisson distribution A is a shape parameter. 


`Кпше} o[eos теппәцойхә (q) :Kprurej uonesoj Tenuouodxg (р) *T 814 
(4) (p) 


-leo 


eja 
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(P) (9) 
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For the following PDF, 
F(x; u, В, o) ! (Y «| z+) x4 
x; +P» = a , 
BT()N B В 


and = 0 otherwise, yz is a location, В a scale, and о a shape parameter. The standard 
density for this location-scale family is 














a 1 =—1„—х 
HOD e*, х> 0 


and = 0 otherwise. For the standard PDF f, о is a shape parameter. 


5.3.1 Uniform Distribution (Rectangular Distribution) 
Definition 1. An RV X is said to have a uniform distribution on the interval 
[a, b], —со <a < b < oo, if its PDF is given by 
1 
(1) fŒ = b~a’ 


0, otherwise. 


a<x<b, 


We will write X ~ U[a, b] if X has a uniform distribution on [a, b]. 
The endpoint a or b or both may be excluded. Clearly, 
oo 
f FQ)dx = 1, 
—oo 


so that (1) indeed defines a PDF. The DF of X is given by 











0, x <a, 
à peg 2—5, а<х <, 
b—a 
1, b <x; 
a+b k peti — akt! 
3 EX = , E -——————, k . . l 
5 2 (k + D(b — a) > О is an integer 
(b — a} 
; X) = -————: 
(4) var(X) pm 
(5) M (t) = (e — е!а), t£. 


© t(b — a) 
Example 1. Let X have a PDF given by 


Ae M, 0«x«oo А> 0, 
Рб) = 


0, otherwise. 
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Then 


0 х<0 
Е = — d. 
5e f —e Ar, х > 0. 


Let = F(X) = 1—e7**. The PDF of Y is given by 





fry) = L. i Ae *-1/A) 108(1—у) — 1, O<y<l. 


Let us define fy(y) = 1 at y = 1. Then we see that Y has density function 


1, О<у<1, 
0, otherwise, 


fr) = 


which is the U[0, 1] distribution. That this is not a mere coincidence is shown in the 
following theorem. 


Theorem 1 (Probability Integral Transformation). Let X be an RV with a 
continuous DF F. Then F(X) has the uniform distribution on [0, 1]. 


The proof is left as an exercise. 
The reader is asked to consider what happens in the case where F is the DF of a 


discrete RV. In the converse direction the following result holds. 


Theorem 2. Let F be any DF, and let X be a U[0, 1] RV. Then there exists а 
function h such that A(X) has DF F, that is, 


(6) P{h(X) < x} = F(x) for all x € (—оо, oo). 
Proof. If F is the DF of a discrete RV У, let 
P{Y = yk} = рк, k=1,2,.... 
Define А as follows: 


у if0 <x <р, 
hx)-i» Їр <x <р +p, 


Then 


P{h(X) = yı} = P{0 < X < pi} = pi, 
P{h(X) = y2} = Рр < X < pi + рә} = p, 
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and, in general, 
P{h(X) = ye} = Pr, Е (ae Gu 
Thus A(X) is a discrete RV with DF F. 
If Е is continuous and strictly increasing, F^! is well defined, and we take 


h(X) = F-!(X). We have 


P(h(X) < x} = P(F (X) < х) 


= P{X < F(x)) 
= F(x), 
as asserted. 
In general, define 
(7) F~'(y) = inflx: F(x) > у}, 
and let h(X) = F-!(X). Then we have 
(8) (FI < x} = {у < FQ). 


Indeed, F-'(y) < x implies that for every = > 0, y < F(x + є). Since € > 015 
arbitrary and F is continuous on the right, we Іеї = — 0 and conclude that y < F(x). 
Since y x F(x) implies that F -d (y) x x by definition (7), it follows that (8) holds 
generally. Thus 


P(F (X) < x} = P(X < F(x)} = Р(х). 


Theorem 2 is quite useful in generating samples with the help of the uniform 
distribution. 


Example 2. Let F be the DF defined by 


0, x <0 


1-е, х > 0. 


F(x) = | 
Then the inverse to y = 1 —e 7, х > 0, is x = —log(1 — y), 0 < y < 1. Thus 
h(y) = —log(1 — у), 
and — log(1 — X) has the required distribution, where X is a U [0, 1] RV. 


Theorem 3. Let X be an RV defined on (0, 1]. If P(x < X < у} depends only 
ony—xforallOzx < у < 1, then X is U[O, 1]. 


210 SOME SPECIAL DISTRIBUTIONS 


Proof, Let P{x < X < у} = f(y — x); then f(x -y) = P{0 < X Ex - y] — 
P{0 < X <x}+P{x < X € x - y) = f(x) + f(y). Note that f is continuous 
from the right. We have 


f(x) = f(x) + / (0), 
so that 
РО) = 0. 


We will show that f(x) = cx for some constant c. It suffices to prove the result for 
positive x. Let m be an integer; then 


f(nx) = fx) +--+ + ЈО) = mf (x). 


Letting x = n/m, we get 


жыка 


so that 


r(*)=2re= 5 ro. 


m 


for positive integers n and m. Letting f (1) = c, we have proved that 
Р(х) = сх 


for rational numbers х. 

To complete the proof we consider the case where x is a positive irrational number. 
Then we can find a decreasing sequence of positive rationals x1, x2, ... such that 
Xn — x. Since f is right continuous, 


Јо) = Hm fo) = Im СХ == CX. 
Now, for O < x < 1, 
F(x) = Р(Х < 0) + P{0 < X <x} 
= F(0)4 P(0 < X < x} 
= f(x) 


= cx, O<x<l. 
Since F(1) = 1, we must have c = 1, so that 
Е(х) = х, 0<х <1. 


This completes the proof. 
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5.3.2 Gamma Distribution 
The integral 


(9) Г(о) = Гнея ах 
0+ 


converges or diverges according as a > © or < 0. For o > 0 the integral in (9) is 
called the gamma function. In particular, if о = 1, Г(1) = 1. If à > 1, integration 
by parts yields 


oo 
(10) Г(о) = (a — » f x*?e7* dx = (а — DT (a — 1). 
0 
If æ = n is a positive integer, then 


(11) P(n) = (n — 1t. 


Also writing x — y?/2 in rà, we see that 


1 1 оо 2 
ri-)=—] eady. 
(5) Al. á 


Now consider the integral J = f% е2 ду. We have 


со роо 2 2 
1? ш= f f exp Ed dx dy, 
—oo Ј оо 


and changing to polar coordinates, we get 


2л роо r2 
pef f r exp { —— | dr 40 = 2л. 
o Jo 2 


It follows that Г(3) = ут. 
Let us write x = y/B, В > 0, in the integral in (9). Then 





оо „@—1 
(12) Г(о) =| e P dy, 
0 


В“ 


so that 


1 а—1 /В 
(13) f -1/—У/8 dy = 1. 
о LGB а 
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Since the integrand in (13) is positive for y > 0, it follows that the function 


1 e-1g-y/B 
E Se , 0 > 
(14) ҒО) = KOM e «y«oo 
0, у <0. 
defines a PDF for a > 0, B > 0. 


Definition 2. An RV X with PDF defined by (14) is said to have a gamma distri- 
bution with parameters a and В. We will write X ~ С(о, В). 


Figure 2 gives graphs of some gamma PDFs. 
The DF of a С (о, В) RV is given by 


| x <0, 
(15) nos аа 

ау, ; 
Tap Jp y% e y x0 


The MGF of X is easily computed. We have 





oo 
(16) Ма) = ra /. ptit 1/8 eal dy 


_ 1 е оо ya—l ey 1 
- (7x) / Fe) ^ р 


1 
= (1 — Вг)“, =. 
(1 — Вг) t< В 





It follows that 


(17) EX = M'(t)|,=0 = of 

and 

(18) EX? = M"(t)li-o = ala + 1)8?, 
so that 

(19) var(X) = «8°. 


Indeed, we can compute the moment of order n such that a +n > 0 directly from 
the density. We have 





1 ox 
20 ЕХ” = f х/В ха+п-1 1 
(20) Г Б Jo е x X 


Г(о +n) 
Г (o) 
= p"(atn—1)(@+n—2)---a 


= 8" 


'suonounj Aysuap ешшегу `2 “BL 


90 


90 


80 


91 


(v) 


g0-» 


90 


90 
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The special case when a = 1 leads to the exponential distribution with param- 
eter В. The PDF of an exponentially distributed RV is therefore 


B^le-*B, x > 0, 
0, otherwise. 


(21) Хо) = | 


Note that we can speak of the exponential distribution on (—оо, 0). The PDF of such 
an RV is 


165/8 
АБЕ 
Clearly, if X ^ G(1, В), we have 
(23) ЕХ" =n!p" 

(24) ЕХ = В and var(X) = f?, 
and 
(25) Ma)=(1—6t)7! гг «^. 


Another special case of importance is when о = n/2, n > 0 (an integer) and 


В =2. 


Definition 3. An RV X is said to have a chi-square distribution ( x?-distribution) 
with n degrees of freedom where n is a positive integer if its PDF is given by 


1 
(26) fa) = | rm222* 
0, x <0. 


—x/2ynf2-1 0«x«oo, 


We will write X ~ x2(n) fora x2 RV with п degrees of freedom (d.f.). 


If X — x?(n), then 


(27) EX —n, var(X) = 2n, 
k 
(28) ЕХ* = ZEE) et 
Г(п/2) 
апа 
(29) M(th=(1—21)"? гг <5. 


Theorem 4. Let Xj, X2,... , X, be independent RVs such that Xj ~ G(aj, В), 
j = 1,2,... ‚п. Then Sn = Уу Xx isa Gry aj, В) RV. 
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Corollary 1. Let X1, X2, ... , Xn be iid RVs, each with an exponential distribu- 
tion with parameter В. Then S, is a С (л, B) RV. 


Corollary 2. If X1, X2, ... , X, are independent RVs such that Ху ~ x?(j), 
j = 1,2,...,п, then Sn isa x?(Y 5. rj) RV. 


Theorem 5. Let X ~ U(0, 1). Then Y = —21og X is x?Q). 


Corollary. Let X1, X2, ... , X, be iid RVs with common distribution U (0, 1). 
Then —2 Y 7 11og X; = 2log(1/ 7, Xi) is x?Qn). 


Theorem 6. Let X ~ G(oj, В) and Y ~ С(о2, В) be independent RVs. Then 
X 4- Y and X/Y are independent. 


Corollary. Let X ^ С(о\, В) and Y ~ Gí(a», В) be independent RVs. Then 
X + Y and X/(X + Y) are independent. 


The converse of Theorem 6 is also true. The result is due to Lukacs [66], and we 
state it without proof. 


Theorem 7. Let X and Y be two nondegenerate RVs that take only positive val- 
ues. Suppose that U — X -- Y and V — X/Y are independent. Then X and Y have 
gamma distribution with the same parameter f. 

Theorem 8. Let X ~ С(1, В). Then the RV X has “no memory,” that is, 

(30) Р{Х >т+5|Х >s}= P(X >т} 


for any two positive real numbers r and s. 


The proof is left as an exercise. 
The converse of Theorem 8 is also true in the following sense. 


Theorem 9. Let F be a DF such that F(x) = Oif x < 0, F(x) < 1 if x > 0, and 


1— F(x + y) 


=1—F() for all x, у > 0. 
1 — F(y) У 


(31) 

Then there exists a constant В > 0 such that 

(32) 1—FQ)2e, x > 0. 
Proof. Equation (31) is equivalent to 


g(x + у) = a(x) + a(y) 
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if we write g(x) = log(1 — F(x)}. From the proof of Theorem 3 it is clear that the 
only right continuous solution is g(x) = cx. Hence F(x) = 1 — e^, x > 0. Since 
F(x) — 1 asx — oo, it follows that c < 0 and the proof is complete. 


Theorem 10. Let Xi, X2,..., X, be iid RVs. Then X; ~ G(l,nf) i = 
1,2, ... , n, if and only if Хүр is GC, В). 


Note that, if X1, X2, ... , Xn are independent with X; ~ G(1, Bj), i = 1,2,... ,n, 
then Xq isa G(1, 1/ 37? £7) RV. 

The following result describes the relationship between exponential and Poisson 
RVs. 


Theorem 11. Let X1, X2,... be a sequence of iid RVs having common expo- 
nential density with parameter В > 0. Let S, = Уу. Xx be the nth partial sum, 
п = 1,2,... , and suppose that t > 0. If = number of 5, є [0,7], then Y isa 
P(t/B) RV. 


Proof. We have 
1 со 
P{Y =0}= P{S; > t} = 5S e B dx =e H/F, 
t 


so that the assertion holds for Y = 0. Let n be a positive integer. Since the X;’s are 
nonnegative, S, is nondecreasing, and 


(33) P(Y =n} = P{S, € t, 5.1 > 1). 

Now 

(34) P{Sn < t) = P{Sn € t, Santi > t) + P{Sn4i < 0). 
It follows that 

(35) P(Y =n} = Р(5, < t] — P{Snti <t}, 


and since 5, ^ Gn, В), we have 





! 1 Jj 1 
PIY = = л—1 —x/B д -f ss yho-X/B 
“э f rop * PTh Pat pr © 
the UP 
= Ea 


as asserted. 


Theorem 12. If X and Y are independent exponential RVs with parameter f, 
then Z = X/(X + Y) has a U (0, 1) distribution. 


218 SOME SPECIAL DISTRIBUTIONS 


Note that in view of Theorem 7, Theorem 12 characterizes the exponential distri- 
bution in the following sense. Let X and Y be independent RVs that are nondegener- 
ate and take only positive values. Suppose that X + Y and X/Y are independent. If 
X/(X + Y) is U(O, 1), X and Y both have the exponential distribution with param- 
eter В. This follows since by Theorem 7, X and Y must have the gamma distribution 
with parameter 8. Thus X/(X + Y) must have (see Theorem 14) the PDF 


F(a, + a2) 


E yhp yee O<x <1, 
Г(о1)Г (a2) , 


f@)= 


and this is the uniform density on (0, 1) if and only if a; = o2 = 1. Thus X and Y 
both have the G(1, В) distribution. 


Theorem 13. Let X be a P(A) RV. Then 
1 оо 
(36) Р{Х < К} = al e x* dx 
K! J, 


expresses the DF of X in terms of an incomplete gamma function. 


Proof. 
а pix «Kj- У) Loea — We) 
dr С rui 
АК аА 
кї 


and it follows that 
eas, Мыса, —Х 
хк) р e ? x* dx, 
as asserted. 
An alternative way of writing (36) is the following: 
PIX x K} = P{Y > 24), 
where X ~ P(A), and Y ~ x?QK +2). 


5.3.3 Beta Distribution 
The integral 


1— 
(37) B(a, B) = [| х®7!1(1— xP! dx 


0+ 
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converges for a > 0, В > О and is called a beta function. For a x O or В x O the 
integral in (37) diverges. It is easy to see that fora > 0, В > 0, 


G8) В(о, В) = B(B, о), 
(39) B(a, В) = f ха х) 9—8 dx, 
0+ 
апа 
_ Г(@)Г(8) 
(40) В(о, В) = T(a4 By 
It follows that 
xt UE Bus 
(41) fo» = В(о, В) ' \ 
0, otherwise, 
defines a PDF. 


Definition 4. An RV X with PDF given by (41) is said to have a beta distribution 
with parameters a and f, a > 0, В > 0. We will write X ~ B(a, В) for a beta 
variable with density (41). 


Figure 3 gives graphs of some beta PDFs. 











Fig. 3. Beta density functions 
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The DF of a B(o, B) RV is given by 


0, x <0, 
Xx 
(42) F(x) = {ЇВ(о, er | y*üa-yfgldy | 0«x«1, 
0+ 
1, х > 1. 


If n is a positive number, then 


n 1 ! nto-—i B—1 
(43) ЕХ" = B, af x (1—x)' dx 
= Bin+a,B) Г(п+о)Г(о + В) 


В(а, В)  Г(о)Г(п+о + В)’ 





using (40). In particular, 





a 
(44) EX = = 
апа 
(45) var(X) = ap 


(a+ B (a -- B 4-1) 


For the MGF of X ~ B(a, В), we have 
1 1 
(46) M(t) = al e^ x* (1 — x)&-! dx. 


Since moments of all order exist, and E|X|/ < 1 for all j, we have 


(47) ма) = У? "ЕХ! 


j= 


| 
© 


th Го + )Г(е+Й8) 


ГО *DT(G ++ DTE 


Ms 


Remark 1. Note that in the special case where a = В = 1 we get the uniform 
distribution on (0, 1). 


Remark 2. If X is a beta RV with parameters œ and f, then 1 — X is a beta 
variate with parameters В and о. In particular, X is В(о, о) if and only if 1 — X is 
B(a, о). A special case is the uniform distribution on (0, 1). If X and 1 — X have the 
same distribution, it does not follow that X has to be B(o, o). All this entails is that 
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the PDF satisfies 
f(x) = fü – х), О<х<1. 
Take 
Јо) = | [хе 101 – х) + (1 – х) 1—1], 


Bia, В) + В(В, о) 
Example 3. Let X be distributed with PDF 


m 2a — x), О<х=<1, 





уа 0, otherwise. 
Then Х ~ B(3, 2) and 

ЕХ" = Г(п 4-3 (5) ке 4! (п +2)! _ 12 

T(3)T(n4-5) 2! (п+4)! (п +4)(п +3)’ 
EX = = var(X) = ы = : 
~ 20° 752.6 25° 

StF (j42)4 

m=% tL. Gros 
d! (3 + 4)! 2! 


оо ti 
E EE G TAG +3) у” 


and 


1 0.5 
P{0.2 < X < 0.5} = af (x? — x3) dx = 0.023. 
12 Jo2 
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O<x < 1. 


Theorem 14. Let X and У be independent С (от, B) and С (о, В), respectively, 


RVs. Then X/(X + Y) is a B(o, a2) RV. 


Let X1, X2, ... , X, be iid RVs with the uniform distribution on [0, 1]. Let Xœ 


be the kth-order statistic. 


Theorem 15. The RV X has a beta distribution with parameters a = k and 


B=n—-—k+1. 


Proof. Let X be the number of X;'s that lie in [0, г]. Then X is b(n, t). We have 


Р{Ха <1) = PIX 2 = УУ (a =, 


j-k 
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Also, 


{P(x >= (“Juv =F w= jr d-)* 1] 
dt WM 


PC eta ard =a" ga on] 
rb Xp] j 


J= 
(т yea yr 
ic i 


On integration, we get 


= 1 
Р{Ху <th=n А f xti — х)" dx, 
k — 17 Jo 


as asserted. 


Remark 3. Note that we have shown that, if X is b(n, p), then 


(48) 1 — PIX <k} = (C i |) pea — х)" dx, 
k-1J Jo 


which expresses the DF of X in terms of the DF of a B(k, n — k + 1) RV. 


Theorem 16. Let X1, X2,... , X, be independent RVs. Then Xj, X2, ... , X, 
are iid B(o, 1) RVs if and only if Xn) ~ B(an, 1). 


5.3.4 Cauchy Distribution 


Definition 5. An RV X is said to have a Cauchy distribution with parameters jz 
and 0 if its PDF is given by 


[Л 1 


(49) Рб) = ЕЗҮН Perv 


—oo«x«oo и>0. 


We will write X ^ C(u, 0) for a Cauchy RV with density (49). 


Figure 4 gives graph of a Cauchy PDF. 
We first check that (49) in fact defines a PDF. Substituting у = (x —0)/u, we get 


oo ] oo dy 2 xi 
dx = — — = —({ E ‘ 
[fe т Ja: ee yo : 
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Fig. 4. Cauchy density function. 


The DF of a C(1, 0) RV is given by 
1 1 

(50) F(x) = у + tan x, —00 < x < Оо. 
л 


Theorem 17. Let X be a Cauchy RV with parameters и and 0. The moments of 
order < 1 exist, but the moments of order > 1 do not exist for the RV X. 


Proof. It suffices to consider the PDF 





1 1 
FONE n —00 < x < oo. 
2 ре 1 
exit 2 f х“ ах, 
z Jo 14x 


and, letting z = 1/(1 + x?) in the integral, we get 
1 
EXE = =f 20902-10] zen dz, 
л Jo 


which converges for œ < 1 and diverges for > 1. This completes the proof of the 
theorem. 


It follows from Theorem 17 that the MGF of a Cauchy RV does not exist. This 
creates some manipulative problems. We note, however, that the cf of X ~ C(u, 0) 
is given by 


(51) фа) = e "ll. 
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Theorem 18. Let X ~ C(u1,01) and Y ~ С(из, 02) be independent RVs. Then 
X +ҮіѕаС(щ + u2, 01 + 02) RV. 


Proof. For notational convenience we will prove the result in the special case 
where ui = uz = 1 and Өү = Ө» = 0, that is, where X and Y have the common PDF 


1 ] 
ғо) = л 1+х?' 





00 < х «Oo. 


The proof іп the general case follows along the same lines. If Z = X + Y, the PDF 
of Z is given by 


fz [UN : b. cd 
= —5 — Ta dx. 
200 2] „1+х1 1+(с—х)7 
Now 
1 
(1 + x21 + (z — x)?] 
1 | 2zx z? 2z? — 2zx z? | 





оа |та Ты? 346-2 146-3 
so that 
fz | і lo 1 +a? +27 tan! x + 2” tan !( ) Б 
= >a | zlog —— +z z x – 
DO = 12002 +4) | 51xG-xX e e 
1 2 a a 
= ==>, —00 < 2 < Оо. 
л 22 +42? * 


It follows that if X and Y are iid C(1, 0) RVs, then X + Y isa C(2, 0) RV. We note 
that the result follows effortlessly from (51). 


Corollary. Let X1, X2, ... , Xn be independent Cauchy RVs, X4 ~ C(uk, %), 
к = 1,2,... ‚п. Then Sn = У Xy isa CQ} ик, Уу б) RV. 


In particular, if X1, X2, ... , X, are iid C(1, 0) RVs, n`! S, is also a C(1, 0) RV. 
This is a remarkable result, the importance of which will become clear in Chapter 6. 
Actually, this property uniquely characterizes the Cauchy distribution. If F is a non- 
degenerate DF with the property that n ^! S, also has DF Е, then F must be a Cauchy 
distribution (see Thompson [112, p. 112]). 

The proof of the following result is simple. 


Theorem 19. Let X be C(u, 0). Then A/ X, where A is a constant, is a С(|А]/ ш, 0) 
RV. 


Corollary. X is C(1, 0) if and only if 1/ X is C(1, 0). 
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We emphasize that if X and 1/X have the same PDF on (—оо, 00), it does not 
follow* that X is C(1, 0), for let X be an RV with PDF 


1 

4 if |x| < 1, 
Хб) = 1 | 

pm! if |x| > 1. 


Then Х апа 1/X have the same PDF, as can easily be checked. 
Theorem 20. Let X be a U(—7/2, 1/2) RV. Then Y = tan X is a Cauchy RV. 


Many important properties of the Cauchy distribution can be derived from this 
result (see Pitman and Williams [78]). 


5.3.5 Normal Distribution (Gaussian Law) 


One of the most important distributions in the study of probability and mathematical 
statistics is the normal distribution, which we examine presently. 


Definition 6. An RV X is said to have a standard normal distribution if its PDF 
is given by 


(52) ф(х) = e Cm. —00 < X < оо. 





We first check that f defines a PDF. Let 
со 
Т = f е0 dx. 
—oo 


2 
—x*/2 < e Pt 


Then 


О <е -0 < X < оо, 


оо 
f e "M dy = 2e, 
—00 


and it follows that J exists. We have 


oo 
1- | y 12,7»? dy 
0 


*Menon [71] has shown that we need the condition that both X and 1/ X be stable to conclude that X 
is Cauchy. 
A nondegenerate distribution function F is said to be stable if for two iid RVs X1, X2 with common 
DF Е, and given constants a;,a2 > 0, we can find œ > 0 and f(a,,a2) such that the RV 


Хз = a! (a,x; + а2Х› — В) 


again has the same distribution Р. Examples are the Cauchy (see the corollary to Theorem 18) and normal 
(discussed in Section 5.3.5) distributions. 
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) . 312 


=f 


es 
Ni 


5 


Thus d ф(х) dx = 1, as required. 
Let us write Y = о X + р, where o > 0. Then the PDF of Y is given by 


1 = 
Wy) = —¢ e J 
o 


o 








1 
(53) = eg o7! 207] —соо<у<осо; o>0, оо <и < оо. 
с 


Definition 7. An RV X is said to have a normal distribution with parameters и 
(—со < u < oo) and o (> 0) if its PDF is given by (53). 


If X is a normally distributed RV with parameters u and с, we will write X ~ 
N (p, 07). In this notation, o defined by (52) is the PDF of an Л/(0, 1) RV. The DF 
of an A (0, 1) RV will be denoted by Ф (х), where 


1 z 2 
54 Ф(х) = — e" /? du. 
e5 М2л J—oo 


Clearly, if X ~ N (u, 07), then Z = (X — и)/о ~ N(O, 1). Z is called a standard 
normal RV. For ће МСЕ of an Л (и, o?) RV, we have 


1 oo ШТЕР, 2 2 
(55) M(t)= f ew : неа) dx 











М2хо J—oo 20? 20? 
1 со —(x — u — a?ty? о?? 
= ——————+ш+——|4 
сн J. | 753 + ш + 5 х 


ot? 
=exp|ut+ E , 


for all real values of t. Moments of all order exist and may be computed from the 
МСЕ Thus 


(56) EX = M'(D)i-o = (и 6? M (Olio = ш 
and 
(57) EX? = M"(t)izo = UM (Do? + (и + 07tY M (co 


=o? +p’. 
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Thus 
(58) var(X) = о?. 


Clearly, the central moments of odd order are all zero. The central moments of 
even order are as follows: 





1 оо 
(59) E(X — ш)?" = 5 xe Do? ау (п is a positive integer) 
o Л J- 


2n 
o 'd 
2"+1/2г 
ae п+ = 2 


= [Qn — DQn — 3)---3- Цо?" 


As for the absolute moment of order с, for a standard normal RV Z we have 
(60) E|Z|* = —— X ес”? 4 


M шс. I(a-1)/21-15—»/2 q 
М2л =a í 4 
_ Ti@ + 1/212 
= m : 
As remarked earlier, the normal distribution is one of the most important distribu- 
tions in probability and statistics, and for this reason the standard normal distribution 
is available in tabular form. Table ST2 at the end of the book gives the probability 


P{Z > z} for various values of z(> 0) in the tail of an A/ (0, 1) RV. In this book we 
write zy for the value of Z that satisfies a = P{Z > zo), 0x a < 1. 


Example 4. By Chebychev's inequality, if E|X|? < oo, EX = џи, and var(X) = 
2 
с“, then 


І 
P(X — ш> Ko} < = 
{IX — pl 2 Ko} < ту. 


For К = 2, we get P(IX — u| > Ko) x 0.25, and for К = 3, we have P(IX — u| > 
Ko} < à. If X is, in particular, N (u o°), then 


P(IX — ш> Ko} = P{|Z| > К), 
where Z is V (0, 1). From Table ST2, 


P{|Z| > 1}=0.318, P(IZ| > 2} = 0.046, and Р{|7| > 3) = 0.002. 
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Thus practically all the distribution is concentrated within three standard devia- 
tions of the mean. 


Example 5. Let X ~ N (3, 4). Then 


Qc. 8064 
Р{2<Х <5}=Р tp EB я ус |=-05<271) 


= P{Z < 1) — P{Z < —0.5} 
= 0.841 — P{Z > 0.5} 
= 0.0841 — 0.309 = 0.532. 


Theorem 21 (Feller [22, p. 175]). Let Z be a standard normal RV. Then 


=х2/2 








(61) P{Z > х} ғ е as х — oo. 
2л x 
More precisely, for every x 0, 
1 2 1 1 1 2 
(62) ——e7 "(i-e rus « pror 
М2л x a xA 27 


Proof. We have 


99 3 1 1 1 
(63) 1 [алу (1 ~ =) dy = —=е7®? G — =) 


2л Jx 


and 


(64) cu Г en (: rm ) yee |. 
м 2л Jx y 2 4 м 2л x k 
as can be checked on differentiation. Approximation (61) follows immediately. 


Theorem 22. Let X1, X7, ... , X, be independent RVs with Xy ~ (ux, оў), 
к = 1,2,... ‚п. Then Sn = Уру Хк is an NY ие, 051 02) ВУ. 


Corollary 1. If X;, X2,... , X, are iid N (и, 0?) RVs, then 5, is an N (np, no?) 
RV and n^! 5, is an N (u, o? /n) RV. 


Corollary 2. If X1, X2, ... , X, are iid N/(0, 1) RVs, then n^!"$S, is also an 
МО, 1) RV. 


We remark that if X1, X2,... , X, are iid RVs with EX = 0, EX? = 1 such that 
п 1/25, also has the same distribution for each n = 1,2, ... , that distribution can 
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only be A/(0, 1). This characterization of the normal distribution will become clear 
when we study the central limit theorem in Chapter 6. 


Theorem 23. Let X and Y be independent RVs. Then X + Y is normally dis- 
tributed if and only if X and Y are both normal. 


If X and Y are independent normal RVs, X + Y is normal by Theorem 22. The 
converse is due to Cramér [15] and will not be proved here. 


Theorem 24. Let X and Y be independent RVs with common V (0, 1) distribu- 
tion. Then X -- Y and X — Y are independent. 


The converse is due to Bernstein [3] and is stated here without proof. 

Theorem 25. If X and Y are independent RVs with the same distribution, and 
if Zj = X + Y and Z2 = X — Y are independent, all RVs X, Y, 21, and 22 are 
normally distributed. 

The following result generalizes Theorem 24. 

Theorem 26. If Xi, X2,... , Xn are independent normal RVs and 7. aibi 
var(X;) = 0, then Lı = У ,aiXi and L2 = У? у b; Xi are independent. Here 


а, 42, ... , an and bj, b2, ... , b, are fixed (nonzero) real numbers. 


Proof. Let var(Xj) = оў, and assume without loss of generality that. E X; = 0, 
i= 1,2,... ‚п. For any real numbers a, 8, and t, 


n 
Ee (Lit BL2 = Eexp [ Yea; + | 
1 
: 2 2.2 
= | [exp (90 + Bbi)*o; 
i=l] 
«М? S „„ 4 P NH › › Lus 2 
= exp EX 549 v om since 2 abioi =0 


ЖЕЛДЕТ, 


п 
Бе!®@ Хі П EefPbi Xi 
1 


> 


ll 
s 


i 


i 
-> 


n n 
= E exp (= Jai x) Eexp (v ук x = Ее! Fh tha, 
1 1 
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Thus we have shown that 

M (at, Bt) = M(at,0)M(O, Br) for all a, B, t. 
It follows that L1 and L2 are independent. 


Corollary. If X;, X2 are independent N (u1, о?) and N (u2, a?) RVs, then X; — 
X» and X, + X» are independent. (This gives Theorem 24.) 


Darmois [19] and Skitovitch [104] provided the converse of Theorem 26, which 
we state without proof. 


Theorem 27. If Xj, X2, ... , X, are independent RVs, a1,a2, ... , as, by, b2, 
... , b, are real numbers none of which equals zero, and if the linear forms 


n n 
Li = aX; and Ly = Y biX; 
i =l i=! 
are independent, all the RVs are normally distributed. 


Corollary. If X and Y are independent RVs such that X + Y and X — Y are 
independent, X, Y, X + Y, and X — Y are all normal. 


Yet another result of this type is the following theorem. 


Theorem 28. Let X1, X2, ... , X, be iid RVs. Then the common distribution is 
normal if and only if 


n n 
$,— 9 X, and У, = x -n sy 
k=}! i=] 


are independent. 

Tn Chapter 7 we prove the necessity part of this result, which is basic to the theory 
of t-tests in statistics (Chapter 10; see also Example 4.4.6). The sufficiency part was 
proved by Lukacs [65], and we will not prove it here. 

Theorem 29. X ~ Л/(0, 1) => X? ~ х?(1). 

See Example 2.5.7 for the proof. 

Corollary 1. If X ~ A (u, 02), the RV Z? = (X — u)?/o? is x?(1). 


Corollary 2. If X1, X2, ... , X, are independent RVs and X, ~ N (jx, ag, k= 
1,2,... , n, then Ур a (Gc — u)? /og is x?(n). 
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Theorem 30. Let X and Y be iid (0, o?) RVs. Then X/Y is C(1, 0). 


For the proof, see Example 2.5.7. 

We remark that the converse of this result does not hold; that is, if Z = X/Y is 
the quotient of two iid RVs and Z has a C(1, 0) distribution, it does not follow that 
X and Y are normal, for take X and Y to be iid with PDF 


Ce 


Tox!’ —OQ < X < OQ. 
л x 


We leave the reader to verify that Z = X/Y is C(1, 0). 


5.3.6 Some Other Continuous Distributions 


Several other distributions that are related to distributions studied earlier also arise 
in practice. We record briefly some of these and their important characteristics. We 
will use these distributions infrequently. We say that X has a lognormal distribution 
if Y = In X has a normal distribution. The PDF of X is then 


1 Ке 2 
s | e| ase 





1 
65 = 
(65) f(x) ME 


and f(x) = 0 for x < 0, where —oo < u < oo, с > 0. In fact for x > 0 


Р(Х <x)=P(n X < 1а x) 





= 


= PY <in x)= P( 
g o 


= o( ==) 
о 


where Ф is ће DF of a Л/(0, 1) RV which easily leads to (65). It is easily seen that 
forn > 0, 


У-и чаи) 


2.52 
кх" оо (o ) 


2. 
ЕХ =ехр (x + z) , уаг(Х) = exp(2u + 202) — exp(2u + o?). 





(66) 


The MGF of X does not exist. 
We say that the RV X has a Pareto distribution with parameters 0 > О anda > 0 
if its PDF is given by 


0? 
(67) fele 020 
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and zero otherwise. Here 0 is scale parameter and o is a shape parameter. It is easy 
to check that 

g2 
(0 + х) 


0 
ЕХ = ——ү, а> 1, and var(X) — 
o — 


F(x) = Р(Х <x)=1- х> 0 


(68) «ө? 


(а — 2)(е — 1)? 
for a > 2. The MGF of X does not exist since all moments of X do not. 


Suppose that X has a Pareto distribution with parameters 0 and a. Writing Y = 
In (X/0), we see that Y has PDF 


ae” 


(69) fr) = (Га ey 


—0о < у < оо, 


and DF 
Fy(y-1—(0-e)*  forally. 


The PDF in (69) is known as a logistic distribution. We introduce location and scale 
parameters jz and о by writing Z = u + o Y, taking a = 1, and then the PDF of Z 
is easily seen to be 


_1 ewe oae) 
(70) 26) = = {1 + exp[(z — ш)/о1р2 


for all real z. This is the PDF of a logistic RV with location and scale parameters и 
and с. We leave the reader to check that 


Е -1 
Fz(z) = exp (=) | + exp (=+) 


2.2 
(71) EZ-pu,  va(Z)— = 
1 
Mz (1) = ехлри)Г(1 - eT +00), t<>. 


Pareto distribution is also related to an exponential distribution. Let X have Pareto 
PDF of the form 


ac? 
(72) Fx(s) = yet!’ x>o 


and zero otherwise. A simple transformation leads to PDF (72) from (67). Then it 
is easily seen that Y = In (X/o) has an exponential distribution with mean 1/a. 
Thus some properties of exponential distribution that are preserved under monotone 
transformations can be derived for Pareto PDF (72) by using the logarithmic trans- 
formation. 
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Some other distributions are related to the gamma distribution. Suppose that X ~ 
G(1, B). Let Y = Х,а > 0. Then Y has PDF 





(73) ЖО) = 2 exp (=). y>0 


and zero otherwise. The RV Y is said to have a Weibull distribution. We leave the 
reader to show that 





Fry -1-ep(3 ). у> 0 


1 
(74) EY" = prep (1 + 2), EY = ВВГ ( + =) 
a а 


он) (64)) 
а a 


The MGF of Y exists only for a > 1 but for æ > 1 it does not have a form useful in 
applications. The special case о = 2, and В = Ө? is known as a Rayleigh distribution. 

Suppose that X has a Weibull distribution with PDF (73). Let Y — In X. Then Y 
has DF 


] 
Fy(y) = 1 — exp (5) ; —00 < y < oo. 


Setting Ө = (1/o) In B and o = 1/a, we get 


(75) Fy(y) = 1 — exp |- exp (=£) 
with PDF 
1 —@ —6 
(76) ЉО) = — exp | —exp (2°) 
[од Гед o 


for —oo « y « oo and о > 0. An RV with PDF (76) is called an extreme value 
distribution with location and scale parameters 0 and с. It can be shown that 


л?а? 


EY = 0 — yo, = ——, 
(17) yo var(Y) 6 


My(t) = e"T(1-4 ot) 
where y 7: 0.577216 is the Euler constant. 


The final distribution we consider is also related to a G(1, В) RV. Let f, be the 
PDF of G(1, В) and f? the PDF 


1 
fix) = s exp (3) , ^ x-«0, = 0 otherwise. 
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Clearly, f2 is also an exponential PDF defined on (—ov, 0). Consider the mixture 
PDF 


(78) Јо) = Цло) + fA), <x < oo. 
Clearly, 

79 fae (-9) —00 <x < oo 
(79) x)= 5 exp в) х 


апа the PDF f defined in (79) is called а Laplace or double exponential PDF. It is 
convenient to introduce a location parameter fz and consider instead the PDF 





(80) fis) = 5 exp(-4 5"), —00 < x < 00, 


where —oo < и < оо, В > 0. It is easy to see that for RV X with PDF (80), we have 
(8) EX=p, va(X)—28?, and M()-e"[— (Br, 


for |t| « 1/8. 
For completeness let us define a mixture PDF (PMP). Let g(x|0) be a PDF and 
let h(0) be a mixing PDF. Then the PDF 


(82) Ро) = f кх\өкө®) d6 


is called a mixture density function. If h is a PMF with support set (061,05, ... , Ok}, 
then (82) reduces to a finite mixture density function 


k 


(83) f(x) = У gGle)h()). 


i=] 


The quantities ^(6;) are called mixing proportions. The PDF (78) is an example with 
k = 2, h(01) = h(@2) = 1, 80101) = fix), апа g(x|62) = Р(х). 


PROBLEMS 5.3 


1. Ргоуе Тћеогет 1. 

2. Let X be an RV with PMF p, = P{X = К} given below. If F is the correspond- 
ing DF, find the distribution of F(X), in the following cases: 

n 


(а) px = C 
(b) рь =e (AK /k),k 20,1,2,...;A > 0. 


Joho py koe hasse pet 
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3. 


6 


9. 


10. 


Let Y, ~ U[O, 1], Үз ~ U[0, Y1], ... , Yn ~ UTO, Ү„—1]. Show that 
Yi ~ Хү, Yo ~ ХІХ), CEN Y, ~ X1X35-:- Xn, 


where X1, X2, ... , Xn are iid U[O, 1] RVs. If U is the number of Y1, Y2, .. . , Yn 
in [t, 1], where 0 « t < 1, show that U has a Poisson distribution with parameter 
— logt. 


Let X1, X2, ... , Xn be iid U[O, 1] RVs. Prove by induction or otherwise that 
Sn = Усу Xx has the PDF 
Љо) =f — DIT! (—1)* (ее – pr^! – 0", 
k=0 


where є(х) = lif x > 0,=0ifx <0. 


. (a) Let X be an RV with PMF р; = P(X = xj), j =0,1,2,..., and let F be 


the DF of X. Show that 


=0 


EF(X) = T ;( 4 $a) 


and 
var F(X) — Уу Р -3 ( — Уи) 
j=0 j=0 


where qj+1 = Уы рг. 
(b) Let p; > Ofor j —0,1,..., N and Y" , p; = 1. Show that 
J j-0 FJ 


EF(X) > N42 
: = 2(N +1) 


with equality if and only if p; = 1/(N + 1) for all j. (Rohatgi [89]) 
Prove (a) Theorem 6 and its corollary, and (b) Theorem 10. 


. Let X be a nonnegative RV of the continuous type, and let Y ~ U (0, X). Also, 


let Z = X — Y. Then the RVs Y and Z are independent if and only if X is 
G (2, 1/4) for some A > 0. (Lamperti [57]) 


Let X and Y be independent RVs with common PDF f(x) = 8^*ex*-! if 0 < 
x < В, and = 0 otherwise; a > 1. Let U = min(X, Y) and V = max(X, Y). 
Find the joint PDF of U and V and the PDF of + V. Show that U/V and V 
are independent. 


Prove Theorem 14. 


Prove Theorem 8. 
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п. 
12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 
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Prove Theorems 19 and 20. 


Let Xj, X5, ... , X, be independent RVs with X; ~ С(ш;, ài) = 1,2,...,n. 
Show that the RV X = 1/» 7.4 X; ! is also a Cauchy RV with parameters 
i| Q2 + u?) and 1/2 + py”), where 





n А; п hi 
А = ————» and p= ae 

Nes iba 
Let X1, X2,... , Xn be iid C(1, 0) RVs and aj # 0, bj, i = 1,2,... , п, be any 
real numbers. Find the distribution of У? 1/(a; Xi + bi). 
Suppose that the load of an airplane wing is a random variable X with A/(1000, 
14400) distribution. The maximum load that the wing can withstand is an RV Y, 
which is A/ (1260, 2500). If X and Y are independent, find the probability that 
the load encountered by the wing is less than its critical load. 
Let X ~ (0, 1). Find the PDF of Z = 1/X?. If X and Y are iid Л/(0, 1), 
deduce that = XY ///X? + Y? is (0, 1). 
In Problem 15 let X and Y be independent normal RVs with zero means. Show 
that U = XY /4 X? + Y? is normal. If, in addition, var(X) = var(Y), show that 
V = (X? —Y?)/ VX 2 + Y? is also normal. Moreover, U and V are independent. 
(Shepp [102]) 


Let X1, X2, Хз, X4 be independent N (0, 1). Show that Y = X1 X» + X3X4 has 
the PDF f(y) = Je^ ^l, —oo < y < оо. 


Let X ~ N(15, 16). Find (a) P(X < 12), (b) P(10 < X < 17}, (c) P(10 < 
X € 19| X < 17), and (d) P(IX — 15| > 0.5). 


Let X ~ A (—1, 9). Find x such that P(X > x} = 0.38. Also find x such that 
P{|X + 1| < x) = 04. 


Let X be an RV such that log(X — a) is A (u, 0?). Show that X has PDF 


[log — a) - up 


-———Eáü— if : 
fo = } с ew 20? | кы 
0 


ifx <a. 


If m, m» are the first two moments of this distribution and оз = и3/ "d ? is the 
coefficient of skewness, show that a, и, o are given by 


/ 2 

mo-—m 

Тар саа c? = log(1 + 32), 
n 


and 
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21. 


22. 


23. 


25. 


27. 


u = log(mi — а) — 50°, 


where 7 is the real root of the equation n° + 3n — оз = 0. 

Let X ~ G(a, B) and let Y ~ U(0, X). 

(a) Find the PDF of Y. 

(b) Find the conditional PDF of X given Y — y. 

(c) Find P(X +Y < 2). 

Let X and Y be iid N (0, 1) RVs. Find the PDF of X/|Y |. Also, find the PDF of 
IX |/lY |. 


It is known that X ^ B(o, B), and P(X < 0.2) = 0.22. Ifa + B = 26, find о 
and В. (Hint: Use Table STI.) 


. Let X1, X2, ... , Xn be iid N (p, о?) RVs. Find the distribution of 


Y, = Dear ЕХЕ BY 
п = a one ` 
(Xx E) 


Let Fi, F?,... , Е, ben DFs. Show that min[ F1 (x1), Fo(x2), ... , Fa(Xn)] is an 
n-dimensional DF with marginal DFs Fj, Fo, ... , Fr. (Kemp [48]) 


. Let X ~ NBC; p) and Y ~ G(1, 1/2). Show that X and Y are related by the 


equation 


P(X < x} = P{Y x[x] forx > 0, A = log (=) : 


where [x] is the largest integer < x. Equivalently, show that 
P(Y є (n, n + 1] = Po{X =n}, 


where Ө = 1 — e^. (Prochaska [80]) 


Let T be an RV with DF F and write S(t) = 1 — F(t) = P(T > t). The 
function F is called the survival (or reliability) function of X (or DF F). The 
function A(t) = f (1)/S(t) is called the hazard (or failure-rate) function. For the 
following PDF, find the hazard function: 


(a) Rayleigh: f(t) = (t/a?) exp( :2/2a?), t » 0. 

(b) Lognormal: f (t) = 1/(to /2z) exp[—(In t — u?/2o?]. 

(c) Pareto: f (t) = a0* /1**!, t > Ө, and = 0 otherwise. 

(d) Weibull: f (t) = (a/B)t*-! exp(—1*/B), t > 0. 

(e) Logistic: f(t) = (1/8) exp[—(t — u)/BH1 + ехр[—( — 49/81) 2, —oo < 


t < oo. 
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28. Consider the PDF 


X \!? Мх — ш)? 
го = (545) Е х > 0 


апа = 0 otherwise. An RV Х with PDF f is said to have an inverse Gaussian 
distribution with parameters jz and A, both positive. Show that 


иЗ 
ЕХ = р, var(X)— а" апа 


А 2t 2 1/2 
M (t) = Eexp(tX) = exp { — i (1-22) 
m 


29. Let f be the PDF of a N (u, o?) RV. 
(a) For what value of c is the function cf", n > 0, a PDF? 
(b) Let Ф be the DF of Z ~ N (0, 1). Find E[Z®(Z)] and E[Z2(Z)]. 


5.4 BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 


In this section we introduce the bivariate and multivariate normal distributions and 
investigate some of their important properties. We note that bivariate analogs:of other 
PDFs are known, but they are not always uniquely identified. For example, there are 
several versions of bivariate exponential PDFs so-called because each has exponen- 
tial marginals. We will not encounter any of these bivariate PDFs in this book. 


Definition 1. A two-dimensional RV (X, Y) is said to have a bivariate normal 
distribution if the joint PDF is of the form 


1 2 
(1) fæ, у) = — ee Q0 ‚ -—00«x-«oo -W< y < 00, 
nooy 1 — р? 


where o1 > 0, 02 > 0, |p| < 1, and О is the positive definite quadratic form 


1 coup m NM 
(2) QG.»- (=) е7 саша, а. (2 а) | 
01 02 





1—р? 01 o 


Figure 1 gives graphs of bivariate normal PDF for selected values of p. 
We first show that (1) indeed defines a joint PDF. In fact, we prove the following 
result. 


Theorem 1. The function defined by (1) and (2) with оү > 0,02 > 0, |p| < 1 
is a joint PDF. The marginal PDFs of X and Y are, respectively, NV (111, o2) and 
N (ua, 02), and p is the correlation coefficient between X and Y. 


`6:0*©:0*©`0— '6'0— = © pue ‘q = to = 10 ‘O = 07 = 117 Цим [ешон eyeuvArg "p “Sty 
(4) (v) 
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1610 *5`0°5`0— '60— 
(р) 





d pug “р = £o = lo ‘Q = © = Iri цим тешцпоц ojeueAtq `(рәпициоә) "т ‘бы 


(2) 


240 
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Proof. Let р(х) = f f(x, y)dy. Note that 


_ "T 2 
a-pa, у) = ‘Gam — №) +(1—p*) (zz) 


02 01 


_ | y — [m + p(o2/o1)( — и1)] | + 0) (* THI ): 
01 


02 


It follows that 





PUNK LA ы ЕХ fF ехр{—(у — б,)^/[202(1 — e?) P 
| ow 2л: 202 —oo ory 1 — p? Мл i 
(3) 


where we have written 


(4) В, = Ha + p-2(x — ш). 
С] 


The integrand is ће PDF of an N (£x, 02(1 — 02)) RV, so that 


Ло) : е (сн, оо « X « оо 
х) = xp|—- Д = < оо. 
: oivan a ei 


[A re | а= [^ лодак =1, 


and f(x, у) is a joint PDF of two RVs of the continuous type. It also follows that f; 
is the marginal PDF of X, so that X is A (m1, оў). In a similar manner we can show 
that Y is N (12, 02). 

Furthermore, we have 








Thus 


(5) fe» І ЛЕ cocos 
AG) оу рл | 202(1— 2) |’ 


where В, is given by (4). It is clear, then, that the conditional PDF fy|x (у | x) given 
by (5) is also normal, with parameters В; and o2( 1 — р?). We have 


(6) E(Y |x) = Br = ua PG Шш) 


and 


(7) var{Y |x} = o2(1 — p°). 
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In order to show that p is the correlation coefficient between X and Y, it suffices 
to show that cov(X, Y) = роо». We have from (6) 


E(XY) = E(E(XY|X)) 
02 
=E {x |a *to—(X- wo] 
01 


po? 
= uu + — oj. 
91 


It follows that 
cov(X, Y) = E(XY) — pı m = роо. 


Remark 1. If p? = 1, then (1) becomes meaningless. But in that case we know 
(Theorem 4.5.1) that there exist constants a and b such that P(Y = aX +b} = 1. 
We thus have a univariate distribution, which is called the bivariate degenerate (or 
singular) normal distribution. The bivariate degenerate normal distribution does not 
have a PDF but corresponds to an RV (X, Y) whose marginal distributions are normal 
or degenerate and are such that (X, Y) falls on a fixed line with probability 1. It is for 
this reason that degenerate distributions are considered as normal distributions with 
variance 0. 


Next we compute ће МСЕ M (fy, t2) of a bivariate normal RV (X, Y). If f(x, y) 
is the PDF given in (1) and f; is the marginal PDF of X, we have 


oo oo 
M(t, t2) = f f e +2Y f(x, y) dx dy, 
—oo J —oo 


= | pi fix | ое dy ей (х) ах 
—oo 


eae 1 22 2 02 
>j e" Лх) рехр 5028 (1 — o) +2 кый Qo Hn dx 


~00 


1 02 99 
= ехр [50220 — д?) + tau — 2, | J ež е(00/01)х8 f(x)dx. 


—О©о 


Now 


ee О 1 оз \? 
/ etero o) f(x) dx = exp E (s + pn) + (a " pn@) | | 
o fon] 2 С] 


Therefore, 


2,2 2,2 
oft, + 0515 + 2001021112 
(8) M(t, t2) = exp С dug е Зе ы c e. 


2 
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The following result is an immediate consequence of (8). 


Theorem 2. If (X, Y) has a bivariate normal distribution, X and Y are indepen- 
dent if and only if о = 0. 


Remark 2. It is quite possible for an RV (X, Y) to have a bivariate density such 
that the marginal densities of X and Y are normal and the correlation coefficient is 
0, yet X and Y are not independent. Indeed, if the marginal densities of X and Y are 
normal, it does not follow that the joint density of (X, Y) is a bivariate normal. Let 


1 1 —1 
(9) f@, у) = 2 {scam oom te [ose 0» 


r тушу 2 А 
+ oar | sont + xy y^] 


Here f(x, y) is a joint PDF such that both marginal densities are normal, f (x, y) 
is not bivariate normal, and X and Y have zero correlation. But X and Y are not 
independent. We have 








1 2 
Лх) = ех, —00 <x < оо, 
М2л 
1 
f(y) = ame” —00 < y < oo, 
л 
апа 
ЕХҮ =0. 


Example 1 (Rosenberg [91]). Let f and g be PDFs with corresponding DFs F 
and G. Also, let 


(10) h(x, y) = /(х)вСу)[1 + «(2Е(х) — DAG) — 1), 


where |a| < 1 is a constant. It was shown in Example 4.3.1 that h is a bivariate 
density function with given marginal densities f and g. 
In particular, take f and g to be the PDF of N (0, 1), that is, 


d. 
e —o00 <x < 00, 





1 
(11) Ро) = 8%) = 
м2л 
and let (X, Y) have the joint PDF h(x, y). We will show that X + Y is not normal 
except in the trivial case а = 0, when X and Y are independent. 

Let Z = X + Y. Then 


EZ = 0, var(Z) = var(X) + var(Y) + 2cov(X, У). 
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It is easy to show (Problem 2) that cov(X, Y) = a/z, so that var(Z) = 2[1 + (œ /7)]. 
If Z is normal, its MGF must be 


(12) M,(t) = ef Utm 
Next we compute the MGF of Z directly from the joint PDF (10). We have 
Mı) == Е{еХ +! ү 
оо оо 
Eus [ J e" DF) — IFO) — 11/0) f(y) dx dy 
—00 4¥—CO 
2 ee z 
=e! «lf eio) - Usa)ae} 1 
—О©о 
Now 
oo оо 2 
J e" [2F (x) — 1]/(х)ах = -2f ех — Е(х)]/ (х) dx + e' 7 
о0о —00 


2 ш as ne TN [x — exp| [202 + и? — 2rx)| dudx 


oo EXP |5 +w +x s] 


2/2 29 
pie cue on 
Joo 0 л 
со i рү оо x m 2 
_„° -f exp[—v*/2 + (®—1)°/4] exp{—[x + (v — 0/2] РРА 
0 Va —oo Jt 
1 2 
exp. —5[(v + 0^/2 
a f” xp {Heo +a] 
0 2/7 
_ „2/2 17/2 | і | 
=е'/'&—2е 1Р2 > =}, 
J2 
(13) 
where Z is an M (0, 1) RV. 
It follows that 
(14) Mi(t) = e + (er 2p |z l || 
=e aye — 2e > — 
1 1 5 


sy e -2^{2\> xl 
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If Z were normally distributed, we must have M;(t) = Mi(t) for all t and all 
[б] < 1, that is, 


2 
t 
15 e elm „ ur 1+a(1-2P fz > <1) | 
(15) 1 Fi 


For a = 0, the equality clearly holds. The expression within the brackets on the right 
side of (15) is bounded by 1 + о, whereas the expression e/7)! ч is unbounded, so 
the equality cannot hold for all t and o. 


Next we investigate the multivariate normal distribution of dimension n, n > 2. 
Let M be an n x n real, symmetric, and positive definite matrix. Let x denote the 
n x 1 column vector of real numbers (х1, x2, .. . , Xn)’, and let zz denote the column 
vector (41, 42, . .. , Un)’, where ш; (і = 1,2,... , п) are real constants. 


Theorem 3. The nonnegative function 


— pYM(x — 
f(x) = cap| 8908-9]. —00 < x; <0, i=1,2,...,n, 


(16) 


defines the joint PDF of some random vector X = (X1, X2, ... , Xn)’, provided that 
the constant c is chosen appropriately. The MGF of X exists and is given by 





(м! 
(17) Mets i) cot 2 ) 
where t = (ti, f2,... , fn)’ and fj, t2, ... , f, are arbitrary real numbers. 
Proof. Let 
oo oo it "M xS n 
(18) reef ef exp [ex - а |] as. 
—©о E 2 i=l 
Changing the variables of integration to yi, y2, ... , Yn by writing xi — ш = yi, 
i=1,2,...,n andy = (yi, Y2, ... , yn)’, we have x — u = y and 
со со М п 
(19) I= сере f -f exp (5 — m) I] dy;. 
790 =e 2 i=l 
Since M is positive definite, it follows that all the n characteristic roots of M, say 
mi, M2, ... , Mn, are positive. Moreover, since M is symmetric, there exists an n x n 


orthogonal matrix L such that L'ML is a diagonal matrix with diagonal elements 
m|,m2,... ‚тл. Let us change the variables to 21, z2,... , Zn by writing у = Lz, 
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where z’ = (21, 22,... , Zn), and note that the Jacobian of this orthogonal transfor- 
mation is |L|. Since L'L = L,, where І, is an n x n unit matrix, |L| = 1 and we 
have 


oo oo (А "МІ. п 
(20) I = сехр(ёи) кз? exp | t'Lz — ZEMU П dzi. 
2 
—00 —00 i=l 


If we write tL = w = (uj, u2, ... , un), then (Lz = ) 7. uizi. Also, L'ML = 
diag(m1, m2,... , mn), so that ZL'MLz = У? miz2. The integral in (20) can 
therefore be written as 


fil OC) RUE G2) 


If follows that 
, (2л )"? п и? 

21 1 = t'u) ———————— Mie | 
(21) c exp( hu us exp УЭ 
Setting tj = t2 = --- = t, = 0, we see from (18) and (21) that 

т is 2 п/2 

J -f (Gin Rien N СЫ La 

De = (mm --- m,)/ 
By choosing 
(22) TS (тт -: m4)? 


(2х)"/? 


we see that f is a joint PDF of some random vector X, as asserted. 
Finally, since 


(L'ML)^! = дар(тт', т>!,... m, ), 


we have 
n u2 
3o =a M"'Dy = Mt. 
i=l] mij 


Also, 
IM^!| = ШМ = (nima ma). 
It follows from (21) and (22) that the MGF of X is given by (17), and we may write 


(23) c= Cmm A 


This completes the proof of Theorem 3. 
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Let us write M^! = ((0ij))i, j=1,2,....n- Then 


2 
t: 
МО, 0, ...,0, 6, 0,... СЕ на) 


is the MGF of Xi, i = 1, 2,... ‚п. Thus each X; is N (ui, ой), i = 1, 2,... , n. For 

i Æ j, we have for ће MGF of X; and X; 
M(0,0,...,0,4,0,...,0,1,0,...,0) 

кышт) 


= exp (mra 2 


This is the MGF of a bivariate normal distribution with means jj, и j, variances oj, 
c jj, and covariance o;;. Thus we see that 


Q4) KW = (ил, ио, ... n) 

is the mean vector of X’ = (X1,... , Xn), 

(25) oj = 02 = а(х), = 1,2,...,п, 
апа 

(26) Ojj = pijOiOj, ij; i,f =1,2,...,n. 


The matrix M7! is called the dispersion (variance—covariance) matrix of the multi- 
variate normal distribution. 

If oj; = Ofori ж j, the matrix M: is a diagonal matrix, and it follows that 
the RVs Xj, X2,... , Xn are independent. Thus we have the following analog of 
Theorem 2. 


Theorem 4. The components Xj, X2,... , Xn of a jointly normally distributed 
RV X are independent if and only if the covariances о;; = O for all i £z j (i,j = 
1,2; cu TE). 


The following result is stated without proof. The proof is similar to the two-variate 
case except that now we consider the quadratic form in n variables: E {У ti(Xj — 


pi) > 0. 


Theorem 5. The probability that the RVs X1, X2,..., Xn with finite variances 
satisfy at least one linear relationship is 1 if and only if |[M| = 0. 

Accordingly, if |M| = 0, all the probability mass is concentrated on a hyperplane 
of dimension < л. 
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Theorem 6. Let (X1, X2,... , Xn) be an n-dimensional RV with a normal dis- 
tribution. Let У, Yo, ... , Yz, k < n, be linear functions of X; (j = 1, 2,..., п). 
Then (Y, Y2,... , Yp) also has a multivariate normal distribution. 


Proof. Without loss of generality let us assume that EX; = 0, i = 1,2,...,n. 
Let 


n 
(27) ПЭ bem 
j=l 
Then EY, = 0, p=1,2,...,k, and 
n 
(28) cov(Yy, Yq) = У? Ар Ашуоу, 
i,j-l 


where E(X; Xj) = oij i, j —1,2,...,n. 
The MGF of (Yi, Y2, ... , Үр) is given by 


n n 
M*(t,t3,..., t) = E Е (Xon коза) : 
j=l j=l 


Writing uj = yi Ару, ј = 1,2,... , n, we have 


(29) M*(t,t9,..., t) = Eee (=) 
{=1 
1 п 
= exp E Y amu) by (17) 


ij=l 


1 n k 
= exp E 3 о) VE 7) 


i j=l {,т:=1 


1,т=1 


К 
1 
= exp E у, titm СОУ(Ү;, А] ; 


When (17) and (29) are compared, the result follows. 


Corollary 1. Every marginal distribution of an n-dimensional normal distribu- 
tion is univariate normal. Moreover, any linear function of X1, X2, ... , Xp is uni- 
variate normal. 
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Corollary 2. If X1, X2, ... , Xn are iid A (и, a°), and A is an n x n orthog- 
onal transformation matrix, the components Yi, Y?,... , Y, of Y = AX', where 


X = (X,,... , Xn)’, are independent RVs, each normally distributed with the same 
variance с. 
We have from (27) and (28) 


n 
cov(Y,, Үл) = у Api Анон + У` Api Agqj%ij 
j=l iFj 


-f if p #4, 


c? if p = 9, 


since 5 ү Api Agi = Оапа ? 5 ., А2, = 1. It follows that 


1 n 
M* (t1, t2,... , tn) = exp (s Se?) : 
l=1 


and Corollary 2 follows. 


Theorem 7. Let X = (X1, X2, ... , Х„)'. Then X has an n-dimensional normal 
distribution if and only if every linear function of X, 


X't—nX|- 6X2 4 d tX, 
has a univariate normal distribution. 


Proof. Suppose that X't is normal for any t. Then the MGF of X't is given by 


(30) MG) = exp (bs + 10222). 


Here b = E{X't} = Уаш = tp, where ш" = (wi,..., Mn), and о? = 


var(X't) = уаг(ў t; Xj) = t M-!t, where М! is the dispersion matrix of X. Thus 
(31) MG) = exp (tus + 51M 'ts?). 

Let s = 1; then 

(32) М(1) = exp (tu + зем") | 


and since the MGF is unique, it follows that X has a multivariate normal distribution. 
The converse follows from Corollary 1 to Theorem 6. 


Many characterization results for the multivariate normal distribution are now 
available. We refer the reader to Lukacs and Laha [67, p. 79]. 
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PROBLEMS 5.4 


1. Let (X, Y) have joint PDF 


8/x 31 ху у? 4 I 
f@ y= Z4" e 3 ea - bn 


for—-oo«x«00,—-o00«y < oo. 
(a) Find the means and variances of X and Y. Also find p. 
(b) Find the conditional PDF of Y given X = x and E{Y|x}, var(Y |x}. 
(c) Find P(4 < Y < 6|X = 4}. 
2. In Example 1, show that cov(X, Y) = о/л. 


3. Let (X, Y) be a bivariate normal RV with parameters 41, (22, o2, оў, апа р. 
What is the distribution of X + Y? Compare your result with that of Example 1. 


4. Let (X, Y) be a bivariate normal RV with parameters ш, 42, оў, o2, and p, and 
let U = aX +b,a 0, and V = cY + d, c £ 0. Find the joint distribution of 
(U, V). 


5. Let (X, Y) be a bivariate normal RV with parameters и; = 5, и2 = 8, с? = 16, 
оў = 9, and p = 0.6. Find P5 < Y < 11| X = 2}. 


6. Let X апа У be jointly normal with means 0. Also, let 
W = Xcos0 + Y sin6, Z = X cos0 — Y sin0. 


Find Ө such that W and Z are independent. 


7. Let (X, Y) be a normal RV with parameters ш, 42, оў, оў, and р. Find а пес- 
essary and sufficient condition for X + Y and X — Y to be independent. 


8. For a bivariate normal RV with parameters 41, 142, 01, 02, and p show that 


1 1 
P(X > m, Y > ио) = у + а бап! R 


2л Ji- 


[Hint: The required probability is P((X — mı)/o1 > 0, (Y — u2)/o2 > 0). 
Change to polar coordinates and integrate.] 


9, Show that every variance—covariance matrix is symmetric positive semidefinite 
and conversely. If the variance-covariance matrix is not positive definite, then 
with probability 1 the random (column) vector X lies in some hyperplane c'X = 
a with с 5 0. 


10. Let (X, Y) be a bivariate normal RV with EX = EY = 0, var(X) = var(Y) = 1, 
and cov(X, Y) = p. Show that the RV Z = Y/X has a Cauchy distribution. 
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11. (a) Show that 


1 Ex? n x 
f(x) = бл? ехр (-=*) | + П (xie "| 


is a joint PDF ор Rp. 
(b) Let (X1, X2, ... , Xn) have PDF f given in (a). Show that the RVs in any 
proper subset of (X1, X2, ... , Xn} containing two or more elements аге 


independent standard normal RVs. 


5.5 EXPONENTIAL FAMILY OF DISTRIBUTIONS 


Most of the distributions that we have so far encountered belong to a general family 
of distributions that we now study. Let © be an interval on the real line, and let 
(fe : 0 € ©} be a family of PDFs (PMFs). Here and in what follows we write 
X = (x1, X2, ... , Xn) unless otherwise specified. 


Definition 1. If there exist real-valued functions Q(0) and D(0) on O and Borel- 
measurable functions T (xj, x2, ... , Xn) and S(x1, x2, ... , Xn) on Rn such that 


(1) Јо (х1, 2,... , Xn) = exp[Q@)T (x) + D@) + S), 
we say that the family ( fo, 0 € ©} is a one-parameter exponential family. 


Let X1, X2, ... , Xm be iid with PMF (PDF) fa. Then the joint distribution of 
X = (X1, X,... , Xm) is given by 


вө(х) = | [5&0 = | [explQ@)7 оу) + р) + sexi 


т 
i=l i=l 


= exp [оф Yoreo +mDO)+ >> зоо] i 


i=l i=l 


where X = (X1, X2, ... , Xm), Xj = (Хр, Xj2,---,Xjn) J = 1,2,...,m, and it 
follows that (go : 0 € ©} is again a one-parameter exponential family. 


Example І. Let X ~ N (uo, o?), where ио is known and c? unknown. Then 


1 E _@& — до)? 
o 2n 20? 


e: 2 
= exp |- log(o V27) — d 


fax) = 





202 
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is a one-parameter exponential family with 


1 
Q(o?) = -555. TO-6- uo), | S(x)-0, and 


D(a’) = —log(ov2z). 
If X ~ N (u, oQ), where со is known but y is unknown, then 
MEN EIS 
соу 27 J 2a 


x2 ux "n ) 


fax) = 


1 
= —=— exp} -—— + ET) 
сом 2л p ( 202 og 202 
is a one-parameter exponential family with 


Оо) = 5. Dw=-4, Тф)=х, 
00 


2 *, 
2069 


and 
x? 1 
S(x) = — Е + таоло) А 


Example 2. Let X ~ P(A), > О unknown. Then 


x 
P{X = х} = eM- = exp[—A + x log A — log(x!)], 
and we see that the family of Poisson PMFs with parameter А is a one-parameter 
exponential family. 

Some other important examples of one-parameter exponential families are bino- 
mial, G (o, В) (provided that one of o, В is fixed), B(o, В) (provided that one of a, B 
is fixed), negative binomial, and geometric. The Cauchy family of densities and the 
uniform distribution on [0, 6] do not belong to this class. 


Theorem 1. Let (fo: 0 € ©} be a one-parameter exponential family of PDFs 
(PMFs) given in (1). Then the family of distributions of T (X) is also a one-parameter 
exponential family of PDFs (PMFs), given by 

ge(t) = exp[t Q(6) + DO) + 5*(1)] 


for suitable S*(t). 
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The proof of Theorem 1 is a simple application of the transformation of vari- 
ables technique studied in Section 4.4 and is left as an exercise, at least for the cases 
considered in Section 4.4. For the general case we refer to Lehmann [63, p. 58]. 

Let us now consider the k-parameter exponential family, k > 2. Let © C R bea 
k-dimensional interval. 


Definition 2. If there exist real-valued functions Qi, Q2,... , Qk, D defined on 
©, and Borel-measurable functions 71, 75, ... , Tk, 5 on Rn such that 


k 
(2) fo(x) = exp lx Q;(0)7, (х) + D(0) + «| 
i=l 
we say that the family { fo, Ө € ©) is a k-parameter exponential family. 


Once again, if X = (Ҳу, X2, ... , Xm) and X; are iid with common distribution 
(2), the joint distributions of X form a k-parameter exponential family. An analog of 
Theorem 1 also holds for the k-parameter exponential family. 


Example 3. The most important example of a k-parameter exponential family is 
N (u, с?) when both u and o? are unknown. We have 





0—(u,0), Ө = (и, 0°) : –оо < u < oo, о? > 0} 
апі 
1 x? —2ux + и? 
fex) == x 202 


2 2 
a x n iju 2 
о + on 2 E -Flog(2zo || . 
It follows that fg is a two-parameter exponential family with 


1 
0.00) = —— 


zs 0090) = 5. no-x nos 


|н? 2 
D(0) = – – zh log(2zo^)|, and 5(х) = 0. 
2 | о 


Other examples are the G(o, В) and B(o, B) distributions when both o, В are 
unknown, and the multinomial distribution. U[a, 8] does not belong to this family, 
nor does C(a, В). 
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Some general properties of exponential families will be studied in Chapter 8, and 
the importance of these families will then become evident. 


Remark 1. Тһе form in (2) is not unique, as easily seen by substituting a О; for 
О; and (1/a)T; for T;. This, however, is not going to be a problem in statistical 
considerations. 


Remark 2. The integer k in Definition 2 is also not unique since the family 
(1, Q1, .... Ок} or (1, Tj, ... , Tk} may be linearly dependent. In general, k need 
not be the dimension of Ө. 


Remark 3. The support [x : fg(x) > 0] does not depend on Ө. 


Remark 4. In (2), one can change parameters to y; = Q;(0), і = 1,2,...,k, 
so that 


k 
(3) Љу) = exp Dp» тт.) + Dim) + 5 «| 


i=l 


where the parameters т} = (71, 72, ... , ny) are called natural parameters. Again, ni 
may be linearly dependent so that one of n; may be eliminated. 


PROBLEMS 5.5 


1. Show that the following families of distributions are one-parameter exponential 
families: 


(a) X ~ b(n, p). 

(b) X ~ G(a, В), (0) if o is known, and (ii) if B is known. 
(c) X ^ Bia, В), (i) if a is known, and (ii) if B is known. 
(d) X ^ NB(r; p), where r is known, p unknown. 


2. Let X ~ C(1, 0). Show that the family of distributions of X is not a one-parameter 
exponential family. 


3. Let X ~ U[0, 8], 8 є [0, oo). Show that the family of distributions of X is not an 
exponential family. 


4. Is the family of PDFs 
fo(x) = de Al, — < x < оо, Ө € (—00,00), 


an exponential family? 
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5. Show that the following families of distributions are two-parameter exponential 
families: 
(a) X ^ G(a, В), both œ and В unknown. 
(b) X ~ B(a, B), both œ and В unknown. 


6. Show that the families of distributions U [o, В] and C(a, В) do not belong to the 
exponential families. 


7. Show that the multinomial distributions form an exponential family. 


СНАРТЕК 6 


Limit Theorems 


6.1 INTRODUCTION 


In this chapter we investigate convergence properties of sequences of random vari- 
ables. The three limit results proved here, namely, the two laws of large numbers and 
the central limit theorem, are of considerable importance in the study of probability 
and statistics. Just as in analysis, we distinguish among several types of convergence. 
The various modes of convergence are introduced in Section 6.2. Sections 6.3 and 
6.4 deal with the laws of large numbers, and the central limit theorem is proved in 
Section 6.6. 

The reader may find some parts of this chapter difficult, at least on first reading. 
These have been identified with a dagger (1) and include the concept of almost sure 
convergence (Section 6.2) and the strong law of large numbers (Section 6.4). Since 
the central limit result is basic and will be used repeatedly in the rest of the book, it 
is important for readers to familiarize themselves with this result and its application 
and to understand its significance. Similarly, on the first reading it will suffice to 
know the strong law of large numbers and to understand its significance. 


6.2 MODES OF CONVERGENCE 


In this section we consider several modes of convergence and investigate their inter- 
relationships. We begin with the weakest mode. 


Definition 1. Let (F,) be a sequence of distribution functions. If there exists a 
DF F such that as n — oo, 


(1) F,(x) > F(x) 


at every point x at which F is continuous, we say that F, converges in law (or, 
weakly), to Е, and we write Ё, 5 Е. 

If (X4) is a sequence of RVs and {F,,} is the corresponding sequence of DFs, we 
say that X, converges in distribution (or law) to X if there exists an RV X with DF 


Е such that Е, e F. We write X, Le X. 
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It must be remembered that it is quite possible for a given sequence of DFs to 
converge to a function that is not a DE. 


Example 1. Consider the sequence of DFs 


0, х <, 
Е, (х) = 1 a 


Here Ё, (х) is the DF of the RV X, degenerate at x = n. We see that А, (x) converges 
to a function F that is identically equal to 0, and hence is not a DF. 


Example 2. Let X1, X2,... , Xn be iid RVs with common density function, 


- 


—, О<х <@, 0 <8 < оо), 
f(x) = 40 ( ) 
0, otherwise. 


Let X(n) = max(X1, X2, ... , Xn). Then the density function of X(n) is 


ny" 
Away ge oe 
0, otherwise, 
and the DF of Хп) is 
0, x < 0, 
Е. (х) = { (х/0)", 0 <х <6, 
1, x > Ө. 


We see that as и — oo, 


Fra) > F(x) = M 


which is a DF. Thus F, — F. 


The following example shows that convergence in distribution does not imply 
convergence of moments. 


Example 3. Let F, be a sequence of DFs defined by 


0, x <0, 
1 
F,@) = Ше, O<x <n, 


1, n Xx. 
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Clearly, Е, — Е, where Е is the DF given by 


0, x «0, 


F(x) = 
(x) h x 2-0. 
Note that Е, is the DF of the RV X, with PMF 


1 


PD =0}=1—1, Р{Х„=п}=-, 
п п 


and F is the DF of the RV X degenerate at 0. We have 


1 
EX = nk (5) == иб, 
п 


where k is a positive integer. Also, EX* = 0, so that 
EX* » ЕХК foranyk> 1. 


We next give an example to show that weak convergence of distribution functions 
does not imply the convergence of corresponding PMFs or PDFs. 


Example 4. Let {X,} be a sequence of RVs with PMF 


1 
1, ifx=2+-, 
Sfax) = P{X, = x} = n 
0, otherwise. 
Note that none of the f,,’s assigns any probability to the point x = 2. It follows that 
Љо) > f(x) | as n oo, 


where f (x) = 0 for all x. However, the sequence of DFs {Fp} of RVs X, converges 
to the function 


0, x <2, 
1 x >2, 


F(x) = | 


at all continuity points of F. Since F is the DF of the RV degenerate at x = 2, 
Е > Е. 


The following result is easy to prove. 
Theorem 1. Let X, be a sequence of integer-valued RVs. Also, let f,(k) = 


Р(Х, =k}, Е 20,1,2,..., be the PMF of Xn, n = 1,2,..., and f(k) = P(X = 
k} be the PMF of X. Then 


ҺО) > f(x) гах & X, X. 
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In the continuous case we state the following result of Scheffé [98] without proof. 
Theorem 2. Let Х„,п = 1,2,... , and X be continuous RVs such that 


fræ) > fa) for (almost) all x as n — oo. 


Here, f, and f are the PDFs of X, and X, respectively. Then X, Lx : 


The following result is easy to establish. 


Theorem 3. Let (X,) be a sequence of RVs such that X, 5 X, and let c be a 
constant. Then 


(a) do ipei X +c, and 

(b) cX, 5 cX,c £0. 

A slightly stronger concept of convergence is defined by convergence in proba- 
bility. 


Definition 2. Let (X,] be a sequence of RVs defined on some probability space 
(Q, S, P). We say that the sequence (X,] converges in probability to the RV X if 
for every є > 0, 


(2) РХ, — Х| > е} > 0 as n — oo. 


We write X, Ed X. 


Remark 1. We emphasize that the definition says nothing about the convergence 
of the RVs X, to the RV X in the sense in which it is understood in real analysis. 


Thus X, EA X does not imply that given £ > 0, we can find ап N such that |X, — 
Х| « eforn > М. Definition 2 speaks only of the convergence of the sequence of 
probabilities P{|X, — Х| > £} toO. 


Example 5. Let {Xn} be a sequence of RVs with PMF 


1 1 
Р{Х„=1}=-, and Р{Х„=0}=1—-. 
п п 
Тһеп 
Р(Х sa if 0 1 
P(IX4| > в} = KW МАТЕР 
0 if e > 1. 


It follows that P(|X,| > =) — 0 as n — оо, and we conclude that X, 4 0. 
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The truth of the following statements can easily be verified. 


L. Х„-—х + е ФОТ] 


ч OC Nn A 


11. 


‚ Xn O X, Xn 5 Y > P(X =Y} = 1 for P(X-Y| > c) < Р{|Х„—Х| > 


c/2}+ Р{|Х„—У| > c/2}, and it follows that P{|X — Y| > c} = 0 for every 
с> 0. 


. Xn Š X = Xn — Xm > 0азп,т > oo for 


Р(Х, — Xml > е) < P[p, - xi» 2] +Р{|Х„—Х|> 5). 


Xn DX, Yn БУУХ, +Y, 5 X Y. 
Р X, -5 X, k constant, > kX, 5 kx. 
OX, ku X24 a. 


P P P 
. Xn > a, Y, — b,a, b constants > X,Y, — ab, for 


(Xn + Yn)? — (Xn = Y р (a+b)? -a-b _ 


1 1 аЬ. 


XnYn = 


. Xa 515 х1 5 1 for 








and each of the three terms on the right goes to 0 as n — oo. 


‚ Xy Б.а, Y, Š b, a, b constants, b £0 > Xp Y7! 5 ab. 
LX, 5 X, and Y an RV => X,Y КА ХҮ. Моге that Y is an RV, so that given 


ô > 0, there exists a k > O such that Р{|У| > k} < 8/2. Thus 
P(|X,Y — XY| > €) = РХ, — ХУ  &|Y| > К} 
+ Р(Х, — ХУ > е, |у| < К} 


<5 +P (Ixa—X1> 2). 


X, 5 X, Y, 5 Y > X,Y, 5 XY, for 


(Xn — XY(Y, — Y) 5 0. 
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The result now follows on multiplication, using result 10. It also follows that 
X, X> X24 x. 


Theorem 4. Let X, d ox , and g be a continuous function defined on R. Then 
g(Xn) S g(X) as n > оо. 


Proof. Since X is an RV, we can, given = > 0, find a constant К = k(e) such 
that 


€ 
P(IX| > k} < =. 
(|> 8 <3 


Also, g is continuous on R, so that g is uniformly continuous on [—k, К]. It follows 
that there exists a ô = ó(e, К) such that 


lg(xn) – 8(х)| < € 

whenever |х| < k and |x, — x| < ô. Let 

А = {|X| <k}, В = (|Xn—X| <6}, C= {lg(Xn) – #(Х)| < €}. 
Then w € AN B > o € С, so that 

ANBCC. 
It follows that 
P(C*) x P{A°} + P{B}, 
that is, 
Р{|8(Х„) - 001 > e) x PX, — X| = 8) + PIX] > k} < e€ 
for n > N(e,6,k), where N (e, ô, К) is chosen so that 
P(X, - Х| > 8} < $ for n > N(e, 8, k). 

Corollary 1. X, 5 c, where c is a constant > g(Xn) 5 g(c), g being a 

continuous function. 


We remark that a more general result than Theorem 4 is true and state it without 


proof (see Rao [86, p. 124]: X, En X, and g continuous on R => g(X,) ie g(X). 
The following two theorems explain the relationship between weak convergence 
and convergence in probability. 
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Theorem 5. X, > X > X, > X. 
Proof. Let Е, and Е, respectively, be the DFs of X,, and X. We have 


(o: Х(ф) < x') = (o: X lw) < x, X(@) < x} U (wv: X,(w) > x, X(w) < x') 
C (X, Xx) U[(X, > x, X x x]. 


It follows that 


F(x x Fy(x) + P(X, > x, X <x’. 


: P 
Since X, — X — 0, we have for x’ < x, 


P(X,»5x,X«x)zP(üX,—X|-x-—x)0 as п оо. 
Therefore, 
F(x’) < lim Fx) x <x. 
n—oo 


Similarly, by interchanging X and X,, and x and x’, we get 
lim F,G) x F(x”), x « x", 
n-»oo 

Thus, for x’ < x < x", we have 


F(x’) x lim F,(x) < lim Fp (х) < F(x”). 


Since F has only a countable number of discontinuity points, we choose x to be a 
point of continuity of F, and letting x" | x and x’ f x, we have 


F(x) = lim Fa) 
at all points of continuity of F. 
Theorem 6. Let k be a constant. Then 
X, > к= Xn >k. 
The proof is left as an exercise. 


Corollary. Let k be a constant. Then 


Xn D k e Xp k. 
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Remark 2. We emphasize that we cannot improve the result above by replac- 
ing k by an RV; that is, X, m. X , in general, does not imply Xn B X, for let 
X, X4, X2, ... be identically distributed RVs, and let the joint distribution of (X,, X) 
be as follows: 





Clearly, X, 4x. But 


Р(Х, —X|> 5} = Р{|Х„— Х| = 1) 
= P{X, =0,X =1}+ P{X, =1,X = 0} 
= 1 -ә 0. 


Hence, Xn p^ X, but X, E X. 


Remark3. Example 3 shows that X, — X does not imply that EX! > EX* 
for any k > 0, k integral. 


Definition 3. Let {X,} be a sequence of RVs such that E|X,|" < oo for some 
r > 0. We say that X, converges in the rth mean to an RV X if E|X|" < oo and 


(3) ElX,—Xl0 as п — oo, 
and we write X, Y. 


Example 6. Let {Xn} be a sequence of RVs defined by 


, n —1,2,.... 


"PEN 


1 
PIX = 0} = Jm P{X, = 1] = 
Then 


1 
E|X,|? = ——0 as п — oo, 
n 


and we see that X, =; X, where RV X is degenerate at 0. 


Theorem 7. Let X, — X for some r > 0. Then Xn 2, Х. 
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The proof is left as an exercise. 


Example 7. Let (X4) be a sequence of RVs defined by 


1 1 
P{X, =O}=1-—, and P{X, =n} = —, r>0, n=1,2,.... 
n n” 


Then Е|Х»|/ = 1, so that X, -> 0. We show that X, -> 0. 


P{|X,| > е} = P{X, =n} if e<n 


А — 0 asn оо. 
0 if 22 


Theorem 8. Let (X,) be a sequence of RVs such that X, ЕЯ X. Then EX, > 
ЕХ and EX? ^ EX? asn > oo. 

Proof. We have 

\Е(Х„— X)| < Е|Х„— Х| < E'?|X,-XP — 0 — asn оо. 
To see that EX? — EX? (see also Theorem 9), we write 
ЕХ? = E(X, — XY + EX? - 2 E(X(X, — X)) 
and note that 
|E{X (X, — X)}| < V EX?E(X, — X 


by the Cauchy-Schwarz inequality. The result follows on passing to the limits. 
We get, in addition, that X, ^» X implies that var(X,) — var(X). 


Corollary. Let {Xm}, {Yn} be two sequences of RVs such that Xm Z X, 
Y, 2 Y. Then E(XmYn) > E(XY) as m,n — оо. 


The proof is left to the reader. j 
As a simple consequence of Theorem 8 and its corollary we see that X,, — X, 


Y, > Y together imply that cov(Xm, Yn) — cov(X, Y). 
Theorem 9. If X, > X, then E|X,|' — Е|Х|/. 
Proof  LetO <r < 1. Then 


E|X,l = ElX, — Х+ Xl 
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so that 
E|X,|' — Е|Х| < Е|Х„— Х|. 
Interchanging X, and X, we get 
E|X|' — ЕХ. < Е|Х„— Х|. 
It follows that 
EX - EIX,' < E|X, —Xl +0 as п oc. 
For r > 1, we use Minkowski's inequality and obtain 
LEIX,I 17^ < [Е|Х„ — XP] + ETI 
and 
LEIXI'Y/" < [Е|Х„ — XI)" + pE|X 17. 
It follows that 
ЕХ, ЕЧХ < EV|X,—X +0 as n оо. 
This completes the proof. 
Theorem 10. Letr > s. Then X, хә Xn 3 xX. 
Proof. From Theorem 3.4.3 it follows that for s < r, 
E|Xn — X! <[Е|Х„—Х|]/" > 0 аѕл > oo 
since X, 5x. 


Remark 4. Clearly, the converse to Theorem 10 cannot hold, since Е|Х|# < oo 
for 5 < r does not imply that E|X|" < оо. 


Remark 5. Іп view of Theorem 9, it follows that X, ^ X => E|X,,|5 > E|X[ 
fors <r. 


Definition 4.? Let {X,} bea sequence of RVs. We say that X, converges almost 
surely (a.s.) to an RV X if and only if 


(4) Pío: Х„(о) > Х(о) asn > oo] = 1, 


and we write X, 25 XorX n — X with probability 1. 





tMay be omitted on the first reading. 
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The following result elucidates Definition 4. 


Theorem 11. X, —5 X if and only if lim, оо P{supy +n |Xm — Х| > в} = 0 
for alle > 0. 


Proof. Since X, Ey x. Xn- X 255 0, and it will be sufficient to show the 
equivalence of 


(a) X, > бапа 
(b) lim, oo P {SUPm>n |Xm| > €] = 0. 


Let us suppose that (a) holds. Let е > 0, and write 
A) = | sup Xal > e] and c=| lim X, = 0}. 
т>п п->со 


Also write В„(є) = C N А„(є), and note that B,+1(€) С В, (є), and the limit set 
CX? В, (=) = Ø. It follows that 


оо 
jim, РВ (е) = P se] = 0. 


Since PC = 1, PC* = 0, and we have 
РВ, (є) = P(An ПС) = 1 — P(C*U AS) 
= 1— РС — PA, + Р(С п А») 
= РА, + Р(С П А) 
= PAn. 


It follows that (b) holds. 
Conversely, let lim, oo P А„(є) = 0, and write 


Dee) = | īm [Xn] > € > O}. 
noo 
Since D(e) C А„(є) forn = 1,2,... , it follows that P D(e) = 0. Also, 
о f 1 
iaia И 1 
C= | jim Xn +0} c U (вы > i}. 
so that 
со 1 
1- PC « V Рр{-}|=0, 
:Yr»(;) 


and (a) holds. 
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Remark 6. Thus X, 25 0 means that for £ > 0, у > O arbitrary, we can find 
an no such that 


(5) P | sup |X4| > e} « n. 


nno 


Indeed, we can write, equivalently, that 


(6 Jim, Р) (Xal > a] =0. 


п>ту 


Theorem 12. X,, 25 y =» Xn E X. 


Proof. By Remark 6, X, cy implies that for arbitrary = > 0,7 > 0, we can 
choose an no = no(£, n) such that 


| yos en] 21-а 
n-no 


Clearly, 


oo 
Піх = Хе) С {Xn Х| <) г п> по. 


п=ту 


It follows that for n > no, 


tx, xis) P| С\йх, xia] eios 


n-ng 
that is, 
P(|X,— Х| > e) «n for п>по, 
which is the same as saying that Х„ ai X. 
That the converse of Theorem 12 does not hold is shown in the following example. 


Example 8. For each positive integer n there exist integers m and k (uniquely 
determined) such that 


n —2* +m, O<m<2, k=0,1,2,.... 


Thus, for n = 1, k = 0 and m = 0; for n = 5, k = 2 and m = 1; and so on. Define 
RVs X, forn = 1,2,... on 2 = [0, 1] by 
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2k Lin < 0 < ml 
Xn(@) = ; Qk 7 Qk? 
0, otherwise. 


Let the probability distribution of X, be given by P{J} = length of the interval 
I C Q. Thus 


P(X, = 2%} - and Р{Х„:=0}=1— x 


The limit lim, oo Xn (w) does not exist for any о € Q, so that X, does not converge 
almost surely. But 


0 if => 2, 
P{X,| > =} = РХ, > є} = 
РЕ ТРТУ 
2k 
and we see that 


P{|X,| > e) > 0 as п (and hence К) — oo. 


Theorem 13. Let {X,} be a strictly decreasing sequence of positive RVs, and 
suppose that X, -5 о. Then 3.55 0. 


The proof is left as an exercise. 


Example 9. Let {Х„} be a sequence of independent RVs defined by 


1 1 
P(X, = 0)=1— =, and Р{Хд=1}=-, n=1,2,.... 


Then 


1 
Е|Х, – 02 = Е|Х„|®7=--—›>0_ asn — оо, 
n 


so that X, TA 0. Also, 


Р{Х„ = О forevery m € n x no) 


ro 1 m-—1 
= 1-- |= : 
II ( 5) no 


п=т 





which diverges to zero as ng — oo for all values of т. Thus X, does not converge 
to 0 with probability 1. 
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Example 10. Let {Xn} be independent, defined by 
1 1 
P{X, =0O}=1—— and P{X, =n}=—, r>2, n=1,2,.... 
п” n" 
Then 
no 1 
P{X,=0 f <n< = ann ae 
(Xn orm <n < no] Ii( >) 


As по — oo, the infinite product converges to some nonzero quantity, which itself 
converges to 1 as m — oo. Thus X, 0. However, E|X,|' = 1, and X, +» Oas 
n > oo. 


Example 11. Let (X,] be a sequence of RVs with P(X, = +1/n} = 1. Then 


E|Xn\" = 1/n' — Oasn — оо, and X, — 0. For j < k,|X;| > [Xx], so that 
{Х| > £} C (Ху > £}. It follows that 


UJ tix; > e = (Xa > е). 


jen 


Choosing n > 1/2, we see that 


n 


j=n 


P [Õu > J = Р{|Х„|>в}< P i. Е 5} 0, 


and (6) implies that X, >> 0. 


Remark 7. In Theorem 6.4.3 we prove a result that is sometimes useful in prov- 
ing a.s. convergence of a sequence of RVs. 


Theorem 14. Let (X,, Yn}, n = 1,2,... , be a sequence of RVs. Then 
Xn- Yal 50 and У, Љу Х, 5 y. 
Proof. Let x be a point of continuity of the DF of Y and £ > 0. Then 


P{Xn € x) = P(Y, Exc Y,-— Xn} 
= P{¥, € x + Yn — Xn; Y, — Xn < €) 
+ P{Yn € x c Ys— Xn; Yn — Xn > €) 


< P(Y, <x t e) t- P(Y, — Xn > €}. 
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It follows that 


lim P(X, <x} < lim P(Y, <x +=}. 
n-»oo noo 


Similarly, 


lim P(X, < x} > lim P(Y, x x — €}. 
n—oo n-»oo 


Since = > 0 is arbitrary and x is a continuity point of P{Y < x}, we get the result 
by letting = > 0. 


Corollary. X, 5 X > X, 5 x. 


Theorem 15 (Slutsky’s Theorem). Let (X,, Yn}, n = 1,2,... , be a sequence 
of pairs of RVs, and let c be a constant. Then 


(а) Xn D X, Yn Sea х„+у„ РЗ 
(b) X, 5 x, 


X,Y, cX — ifc £0, 


P 
Y, с = Р 
XnY¥, — 0 if c = 0; 


(с) X, 5 X, Yn “> с => х„/у„ > Х/сїїс £0. 


Proof (a) Xn 5 X > Х„+с — X +c (Theorem 3). Also, Y, — с = 
(Y, + Xn) — (Xn + c) —> 0. A simple use of Theorem 14 shows that 


X, Y, D X c. 
(b) We first consider the case where c = 0. We have for any fixed number k > 0, 


€ € 
Р(Х > е) = РХ > е, ma < È} + Риха > е, > E] 


€ 
< P(IXnl > k} + Р > =|. 


Since Y, 4 0 and X, 5 X, it follows that for any fixed k > 0, 
lim Р{|Х„У„| > €) € Р{|Х| > k}. 
п со 


Since k is arbitrary, we сап make P{|X| > k} as small as we please by choosing k 
large. It follows that 


XnYn > 0. 
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Now, let c Æ 0. Then 


XnYn — Xn = Xn(Un — с), 


and since X, ® X, Yn SES c, X, (Y, —c) n Q. Using Theorem 14, we get the result 
that 


X,Y, 5 cX. 


(с) Y, 5 c, ande 40 9 Y7! 5 сті. It follows that X, 5 X, Y, > c > 
XnY, 14 ex , and the proof of the theorem is complete. 


As an application of Theorem 15, we present the following example. Many more 
examples appear in Chapter 7. 


Example 12. Let X1, X2,... , be iid RVs with common law M (0, 1). We shall 
determine the limiting distribution of the RV 


Xi Xo Xs 
Wa TL m. 
n VOW 4 X 


Let us write 


1 х2 + х2 +... Х2 
Un = (X1 +X2 +-+- + Xn) and „= ко 


Тһеп 


For the MGF of U, we have 


n n 
Muy, (t) = П Ее Хі" = Пе?" 


i=] i=] 


2 
= e”)? 


so that U, is an Л/(0, 1) variate (see also Corollary 2 to Theorem 5.3.22). It follows 


that U, E Z, where Z is an A (0, 1) RV. As for V,, we note that each x? isa 
chi-square variate with 1 d.f. Thus 


n 1 1/2 5 
Mv, (t) = ———— ыш 
x HH) dh 


i=] 


( zy" n 
eq ^ Wc 
n 2 
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which is ће МСЕ of a gamma variate with parameters о = n/2 and В = 2/n. Thus 
the density function of V, is given by 


1 1 


о) = { T(/2) my 
0, otherwise. 


п/2—16-пх[2, О <х < оо, 


We will show that V, Ed 1. We have for any є > 0, 


2 
PV, - 11» e) s TO = (5) (=) 0 as n oo. 





We have thus shown that 
U, Z and V, 1. 
It follows by Theorem [5(c) that W, = Un / Vn 5 Z, where Z is an N (0, 1) RV. 


Later we will see that the condition that the X;’s be N (0, 1) is not needed. All we 
need is that E|X;|? < оо. 


PROBLEMS 6.2 


1. Let X1, X2,... be a sequence of RVs with corresponding DFs given by Ё, (х) = 
Oifx < —n, = (x +n)/2n if —n € x <n,and=1 if x > n. Does F, converge 
to a DF? 


2. Let Ху, X2... be iid N(O, 1) RVs. Consider the sequence of RVs {Xn}, where 
n —n У 4 X;. Let F, be the DF of X,, и = 1,2,.... Find lim, oo Fn (x). 
Is this limit a DF? 


3. Let X1, X2,... be iid U(0,0) RVs. Let Xa) = min(X1, X2, ... , Xn), and 
consider the sequence Y, = n Хуу. Does Y, converge in distribution to some RV 
Y? If so, find the DF of RV Y. 


4. Let Xj, X2,... be iid RVs with common absolutely continuous DF F. Let 
Xn) = max(X}, X2,... , Xn), and consider the sequence of RVs Y, = n[1 — 
F(X (ny)]. Find the limiting DF of Ү,. 


5. Let X1, X2,... be a sequence of iid RVs with common PDF f(x) = e~** if 
x >0,and = Oif x < 0. Write X, =n! Y 7.4 Xi. 


(a) Show that X, > 1+0. 
(b) Show that min( X1, X2, ... , Xy] Š 6. 


6. Let Ху, X2, ... be iid U[0, Ө] RVs. Show that max(Xi, X2, ... , Xn} Š ө. 
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7. 


10. 


п. 


12 
13. 


14. 


Let {X,} be a sequence of RVs such that X, 5 X. Let a, be a sequence of 
P 
positive constants such that a, — oo as п — oo. Show that a, ly, > 0. 


. Let {Xn} be a sequence of RVs such that P(IX,| < k} = 1 for all n and some 


constant k > 0. Suppose that X, Z, X. Show that X п — X for any r > 0. 


. Let X1, X2,... , Xan be iid Л/(О, 1) RVs. Define 


U, 
V, = X24 X2 X2, and 2, = у 
п 


Find the limiting distribution of Z,. 


Let {Xn} be a sequence of geometric RVs with parameter 1/n,n > А > 0. Also, 
let Za = Xn/n. Show that 2, 4 С(1, 1/A) as n — oo. (Prochaska [80]) 


Let X, be a sequence of RVs such that X, 255 0, and let c, be a sequence of 
real numbers such that c, — 0 as n — oo. Show that X, + cn Sow 


Does convergence almost surely imply convergence of moments? 


Let X1, X2, ... , be a sequence of iid RVs with common DF F, and write Xm) = 
max{X 1, X2,..., Xa; n = 1, 2,.... 


(a) Fora >.0, lim; oo x? P{X; > x} = b > 0. Find the limiting distribution 
of (bn)~'/* Ху. Also, find the PDF corresponding to the limiting DF and 
compute its moments. 


(b) If F satisfies 
lim e&'[1 — Е(х)] = b > 0, 
X—»0o0 


find the limiting DF of X(n) — log(bn) and compute the corresponding PDF 
and the МОР. 


(c) If X; is bounded above by xo with probability 1, and for some о > 0 





lim (xo — x) ^(1 — F(x)] =b > 0, 
x-—2X0— 


find the limiting distribution of (bn) /*(X (n) — xo}, the corresponding PDF, 
and the moments of the limiting distribution. 


(The remarkable result above, due to Gnedenko [33], exhausts all limiting dis- 
tributions of X(n) with suitable norming and centering.) 


Let (F,] be a sequence of DFs that converges weakly to a DF F that is continu- 
ous everywhere. Show that Р, (х) converges to F(x) uniformly. 
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15. Prove Theorem 1. 

16. Prove Theorem 6. 

17. Prove Theorem 13. 

18. Prove Corollary 1 to Theorem 8. 


19. Let V be the class of all random variables defined on a probability space with 
finite expectations, and for X € V define 


p(X) = а ыш |. 
1+ |X| 
Show the following: 
(a) p(X +Y) < p(X) + p(Y); p(o X) < max(lo}, 1)0(Х). 
(b) d(X, Y) = p(X — Y) is a distance function on V (assuming that we identify 
RVs that are a.s. equal). 
(с) т.о d(Xn, X) =0 & X, > X. 
20. For the following sequences of RVs (X, ), investigate convergence in probability 
and convergence in rth mean. 
(а) X4 ^ C(1/n, 0). 
(b P(X, =e") =1/n?, P(X, = 0) = 1—1/n?. 


63 WEAK LAW OF LARGE NUMBERS 


Let {Xn} be a sequence of RVs. Write 5, = У Xy, n = 1,2,.... In this section 
we answer the following question in the affirmative: Do there exist sequences of 
constants A, and B, > 0, B, — oo as n — oo, such that the sequence of RVs 
B, ($5 — An) converges in probability to 0 as n — oo? 


Definition 1. Let (X,) be a sequence of RVs, and let $, = 5; Хк, п = 
1,2,.... We say that (X,) obeys the weak law of large numbers (WLLN) with 
respect to the sequence of constants {By}, B, > 0, B, 7 oo, if there exists a se- 


P 
quence of real constants A, such that By 105, — Аһ) — Oas n — oo. A, are called 
centering constants, and B,, norming constants. 


Theorem 1. Let {X,,} be a sequence of pairwise uncorrelated RVs with EX; = 
ш and var(X;) = o2, i = 1,2,.. ху» eae — oo as п — oo, we can choose 


An = Уу He and В, = У уор, that is, 


n 

Xi—Hi P 
у 50 asn — oo. 
i=l 2Li-19i 
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Proof. We have, by Chebychev’s inequality, 


n n " Ez | 2 
P| - Es DX 
k=l i=l 


2 
e (Xa оў) 
Corollary 1. If the X,’s are identically distributed and pairwise uncorrelated 








1 
=; - 0 asn — oo. 
2 
e Y 1. 0j 
with E X; = и and var(X;) = a? < oo, we can choose A, = пи and B, = па?. 
-25n „2 


Corollary 2. In Theorem 1 we сап choose B, = п, provided that n 10; > 
Oasn — oo. 


Corollary 3. In Corollary 1 we can take A, = ny and В„ = n, since no? [n? > 
0 as п — oo. Thus, if {X,} are pairwise-uncorrelated identically distributed RVs 


: Е : P 
with finite variance, $,/n — H. 


Example І. Let X1, X2, ... be iid RVs with common law b(1, p). Then E X; = 
р, Var(X;) = p(1 — р), and we have 
Sn P 


— >p asn —> со. 
n 


Note that S,,/n is the proportion of successes in n trials. 


Hereafter, we shall be interested mainly in the case where B, = n. When we say 
that {X,,} obeys the WLLN, this is so with respect to the sequence {n}. 


Theorem 2. Let {Xn} be any sequence of RVs. Write Y, = n^! Ye Xe. A 
necessary and sufficient condition for the sequence {Xn} to satisfy the weak law of 
large numbers is that 





y2 
(1) E ESA —0  asn- ov. 


Proof. For any two positive numbers a, b, a 7 b > 0, we have 


a 1+Ь 


е 1+а Ь 





Let А = {|У„| > £}. Then o € A => |У„|? > e? > 0. Using (2), we see that w € А 
implies that 
Y? Tue 

>1 


п 


1+? е2 
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It follows that 





2 2 
РА < pl. > 
1+2 7 142 


2/0 + Y2)I 
~ 8/2) 
— 0 asn — oo. 


by Markov’s inequality 


That is, 
P 
Y, > 0 asn — oo. 


Conversely, we will show that for every є > 0, 
Y? 3 

n 
(з) РШ > е) > Е Ip 62. 


We will prove (3) for the case in which Y, is of the continuous type. The discrete 
case being similar, we ask the reader to complete the proof. If Y, has PDF /, (у), 





then 
f.3 iy e odit / + Ih a Y (ууй y 
lyl>e  iyise 
= Р] > 1+ | (1- гүр ;) 4» 
< P(lY, < Р{|У„| > ғ} + е2, 
which is (3). 


Remark 1. Since condition (1) applies not to the individual variables but to their 
sum, Theorem 2 is of limited use. We note, however, that all weak laws of large num- 
bers obtained as corollaries to Theorem 1 follow easily from Theorem 2 (Problem 6). 


Example 2. Let (Ху, X25, ... , Xn) be jointly normal with EX; = 0, ЕХ? = 1 
for all i, and cov(X;, Ху) = pif |j—i| = 1, and = 0 otherwise. Then Sn = Ye Xk 
is N (0, o2), where 


o? = var(S,) = n +2(n— lp, 
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gl. dog]. s. 
1+2 п? + 52 


2. oo x? 
~ en Í n? + x? 
2 [° jy[n2(-D0Dpo] 2 
= ————————————— e y dy 
„л Јо n? + y?[n + 2(n — 1p] 
_2t+2— Ve 





32. 2 
e * [20 dx 


oo 
26-722 4 
—==у°е y0 asn — oo. 
Í ~ 21 


It follows from Theorem 2 that n^! S, 5 0. We invite the reader to compare this 
result to that of Problem 6.5.6. 


Example 3. Let X1, X2, ... be iid C(1, 0) RVs. We have seen (corollary to The- 
orem 5.3.18) that n^! Sn ~ C(1, 0), so that n^ $, does not converge in probability 
to 0. It follows that the WLLN does not hold (see also Problem 10). 


Let X1, X2,... be an arbitrary sequence of RVs, and let S, = 26 Xk, n = 
1,2,.... Let us truncate each X; at c > 0, that is, let 


Х = 


1 


Xj if |X;| xc 
0 ХД >c’ 


Write 
n п 
Si = ух, and т, = у EXC 
i=l i=l 
Lemma 1. For any £ > 0, 
n 
(4) P(I$, ~ mn| > e) < PUS; — mal > в}+ D> PIX > с}. 
k=l 
Proof. We have 
P{|S, — т.| > €} = Р{|$„ — mn] > cand |X] x c fork = 1,2,...,n} 


+ PIS; — т„| > £ and |X] > с for at least one k, 
k —1,2,...,n) 


< P(IS; —m4| > e) Р{|Хк| > c for at least one К, 
1<k<n} 


< P(IS; — mn) > e + 3 Р{|Хд| > с}. 
k=1 
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Corollary. If X1, X2, ... , X, are exchangeable, then 

(5) P{|Sp — mpl > в} < P{|S, — mal > e}  nP(IXil > с}. 

If, in addition, the RVs X4, X2,... , X, are independent, then 


nE(X¢)? 
(6) PS, ml» e) < ——у!— +пР{|Х\| > с}. 
Inequality (6) yields the following important theorem. 


Theorem 3. Let {X,} be a sequence of iid RVs with common finite mean и = 
EX1. Then 


Ro Gye as п — oo. 
Proof. Let us take c = n in (6) and replace ғ by ne; then we have 
PS, — mal > ne] < =z EQ?" nP(IXi| > n), 
where X* is X; truncated at n. 


First note that Е|Х1| < oo = nP(|Xi4| > п} —> Qas n — оо. Now (see remarks 
following Lemma 3.2.1) 


E(x") = af хР{|Х\| > x}dx 
0 


А п 
=2( f +f ) «POX > nas. 
0 A 


where A is chosen sufficiently large that 
ô А 
хР{|Х|>х} < 5 for all x > А, д > О arbitrary. 
Thus 


n 
E(x"? <e+s f dx <c+n6, 
A 
where c is a constant. It follows that 
1 n2 с ô 
ЕО) Sb 


ne E?’ 


and since 8 is arbitrary, (1/ne*) E(X 2)* can be made arbitrarily small for sufficiently 
large n. The proof is now completed by the simple observation that since ЕХ; = и, 
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Mn 
— ә ш an>. 
n 


We emphasize that in Theorem 3 we require only that E|X1| < oo; nothing is 
said about the variance. Theorem 3 is due to Khintchine. 


Example 4. Let X1, X2,... be iid RVs with E|X,|* < oo for some positive 
integer k. Then 


k 
n 
Xj 


P 
У: — EX asn — oo. 
ja 


Thus, if EX? < oo, then ^1 X?/n > EXT; and since (Y. X;/nf 5 (EX1), 
it follows that 





EX? Ex,\2 
i (=) 5 var(X}). 
n n 


Example 5. Let Ху, X2, ... be iid RVs with common PDF 


1+8 Е 
f=] 2 8>0. 
; x«l 


Then 


о | 
exi a» f 8 4х 


] 46 
= ——— «00, 


ô 


and the law of large numbers holds, that is, 


Р 1+8 


n S, — ar asn — oo. 
PROBLEMS 6.3 
1. Let X1, X2,... be a sequence of iid RVs with common uniform distribution on 
[0, 1]. Also, let 2„ = (Т; Xi)!/” be the geometric mean of Xi, X2, ... , Xn, 
n = 1,2,.... Show that Z, 4 c, where c is a constant. Find c. 


2. Let X1, X2,... be iid RVs with finite second moment. Let 
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2 п 
Y, = —  — у^ 
i ау 


Show that У, X ЕХ}. 


3. Let X1, X2,... be a sequence of iid RVs with EX; = и and уаг(х;) = o?. 
Let S, = ‘DE X j. Does the sequence 5; obey the WLLN in the sense of 
Definition 1? If so, find the centering and the norming constants. 


4. Let {Xn} be a sequence of RVs for which уаг(Х„) < C for all n and pij = 
cov(X;, Ху) ^ 0а |i — j| > oo. Show that the WLLN holds. 
5. For the following sequences of independent RVs, does the WLLN hold? 
(a) Р(Х, = $25 = 1. 
(b) P(X, = +k} = 1/2Vk, P{X, = 0} = 1 — (1/70). 
(c) P(X, = £2") = 1/2744), P{X; = 0} = 1 — (1/27). 
(d) Р{Х = £1/k} =. 
(e) PUX, = Ek) = 3. 
6. Let Xi, X2, ... be a sequence of independent RVs such that var (Хр) < oo for 


= 1,2,... , and (1/п2) 35. va(X4) > 0 as n — oo. Prove the WLLN, 
using Theorem 2. 


7. Let X, bea sequence of RVs with common finite variance c?. Suppose that the 
correlation coefficient between X; and X; is < O for all i з j. Show that the 
WLLN holds for the sequence {Xn}. 


8. Let {Xn} be a sequence of RVs such that X, is independent of Ху for j #k+1 
or j Æ К — 1. If var(X,4) < C for all k, where C is a constant, the WLLN holds 
for {Xx}. 


9. For any sequence of RVs {Xn}, show that 


P t P 
max | Х| > 0n iş: > 0. 
lxkzn 


10. Let X4, X2,... be iid C(1, 0) RVs. Use Theorem 2 to show that the weak law of 
large numbers does not hold. That is, show that 


52 


п 
E——— +0 asn — oo, where S, = Xk, п=1,2,.... 
п? + 52 Е 2s 


k=l 
11. Let {Xn} be a sequence of iid RVs with Р(Х, > 0} = 1. Let S, = Vint Xj, 


n = 1,2, .... Suppose that {an} is a sequence of constants such that a, 1 Sp s 
1. Show that (а) a, — oo as n — oo, and (b) Gn41/dn > 1. 
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6.4 STRONG LAW OF LARGE NUMBERS? 


In this section we obtain a stronger form of the law of large numbers discussed in 
Section 6.3. Let X;, X2,... be a sequence of RVs defined on a probability space 
(Q, S, P). 


Definition 1. We say that the sequence {X,,} obeys the strong law of large num- 
bers (SLLN) with respect to the norming constants {B,,} if there exists a sequence of 
(centering) constants { А„} such that 


(1) B.'(S,— An) — 0 аѕл > oo. 
Here В, > 0 and B, — co asn — оо. 


We will obtain sufficient conditions for a sequence (X,) to obey the SLLN. In 
what follows we will be interested mainly in the case B, — n. Indeed, when we 
speak of the SLLN we will assume that we are speaking of the norming constants 
B, — n, unless specified otherwise. 

We start with the Borel-Cantelli lemma. Let {Aj} be any sequence of events in S. 
We recall that 


О) im, An = lim а= = Й Um 


n=l К=п 


We will write А = lim, , оо Аһ. Note that A is the event that infinitely many of ће 
An occur. We will sometimes write 


PA = P(lim An) = P(A, i.0.), 
n> 


where “i.o.” stands for “infinitely often” In view of Theorem 6.2.11 and Re- 
mark 6.2.6 we have X, —» 0 if and only if P([X,| > £ i.o.} = Oforall e > 0. 


Theorem 1 (Borel—Cantelli Lemma) 


(a) Let {An} be a sequence of events such that ee 1 PA, < oo. Then РА = 0. 


(b) If {An} is an independent sequence of events such that Fe —1 PAn = oo, 
then PA = 1. 


Proof. 

(a) РА = Р(іт, оо {кеп Ax) = Ш> Р(\ с» Ак) < limus oo Yn 
РА; = 0. 

(b) We have A* = L Jr? , (700, Ас, so that 


This section may be omitted on first reading 


282 LIMIT THEOREMS 
oo oo 
C : сї __ | с 
„а (im. П ai) z j (A s; | 


For ng > п, we see that (00, Az C ke, Ас, so that 


=п 
oo no no 
c ; cl. p = 
(Fas) = im» (Pon) = am. Te - rao, 


because {An} is an independent sequence of events. Now we use the elementary 
inequality 


no no no 
ice (- Eu) <1- ffa -ep < Bay no>n, 1>ају > 0, 
ј=п ј=п j=n 
to conclude that 
oo no 
с s PM 
P (A х) < „lim exp | 2, 2 


Since the series У), P A, diverges, it follows that РАС = 0 or PA = 1. 


Corollary. Let (A,] be a sequence of independent events. Then PA is either 0 
or 1. 


The corollary follows since ? 7-., P A, either converges or diverges. 


As a simple application of the Borel-Cantelli lemma, we obtain a version of the 
SLLN. 


Theorem 2. If X1, X2,... are iid RVs with common mean и and finite fourth 
moment, then 


Proof. We have 


E(E(X; — ш = пЕ(Х\ — uy + s()e* < Cn’. 


By Markov's inequality, 


d У-и 
1 


EEX- cn? c 
2 “| = qm (ney a 
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Therefore, 
oo 
X Р{|5, — un| > ne} < оо, 
n=1 


and it follows by the Borel-Cantelli lemma that with probability 1 only finitely many 
of the events (o: |(S,/n) — u| > €) occur, that is, РА, = 0, where 


zi 


The sets A, increase, as = — 0, to the w set on which S,/n =» u. Letting = > 0 
through a countable set of values, we have 


{1-0} =r (Yan) о 


Corollary. If Ху, X2,... are iid RVs such that P{|X,| < K} = 1 for all n, 
where К is a positive constant, then n^! S, 25 и. 


Sn 
——H 
n 








A; = „т sup | 


Theorem 3. Let Ху, X2,... be a sequence of independent RVs. Then 
oo 
X, —> 04 M P(X,|» e) «oo — foralle > 0. 
n-l 


Proof. Writing A, = {|Х»„| > £}, we see that { А„} is a sequence of independent 


events. Since X, 555 0, Xn — Oonaset Е with PE = 0. A point w є Е belongs 
only to a finite number of A,. It follows that 


lim sup A, C E, 
noo 
hence P(A, i.o.) = 0. By the Borel-Cantelli lemma [Theorem 1(b)] we must have 


У) PAn < oo. [Otherwise, У, РА, = оо, and then P(A, 1.0.) = 1.] 
In the other direction, let 


1 
A1/x = lim sup СЯ > | Я 
п-> оо k 
and use the argument in the proof of Theorem 2. 
Example I. We take an application of the Borel-Cantelli lemma to prove a.s. 


convergence. 
Let {Xn} have PMF 


1 1 
P(X, = 0) =1-—, and P(X, = tn) = —. 
n 


2n? 
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Then P(|X,| > £) = 1/n? and it follows that 
oo oo 1 
>> Р(Х„|>в)= У «oo fora > 1. 
n 
n=l п=1 


Thus from Borel-Cantelli Lemma P (A, i.o.) = 0, where A, = {|Х„| > £}. Now 
using the argument in the proof of Theorem 2, we can show that Р(Х, 4 0} = 


We next prove some important lemmas that we will need subsequently. 
Lemma 1 (Kolmogorov's Inequality). Let X1, X2,..., X, be independent 


RVs with common mean 0 and variances оў, к = 1,2,... , n, respectively. Then for 
any є > 0, 


(3) P| max > ef < ri 


Proof. Let Ao = ©, 


Ак = шах. ISjl x вр, k=1,2,...,n 
1<j<k 
and 
By = Ag-1 0 At 
= {151 < в,... , [Sk-1| x e) n {at least one of S1], ... , 18| is > є) 
= {|S < £,... , [Ski] < £, {Skl > £}. 
It follows that 
n 
AC =J B. 
k=l 
and 


By C {15.11 < е, |$] > €}. 
As usual, let us write 7g,, for the indicator function of the event Bg. Then 


Е(5,1в,)? = Е{(5„ — Sip, + Seta, Y}, 
= Е{(5„ — Sy)? IE, + S2Ip, + 28305, — Sk) IB}. 


Since Sn — Sy = Xk+1 +: + Xn and 5; Гв, are independent, and E X, = 0 for all 
k, it follows that 
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Е(5,1в,)2 = Е{(5„ — Sx) IB V. + E(Sup, Y? 
> E(Sup,? > €?P Be. 


The last inequality follows from the fact that in Вк, |Sk| > =. Moreover, 


n n 
E(SnIp,)” = E(Splac) < E(S;) = У оу, 
k=) 1 


so that 
п п 
J ok > г? у, РВ = & POL. 
1 1 


as asserted. 


Corollary. Take n = 1; then 


2 
[ед 

Р(Х > е} < +, 
e 


which is Chebychev’s inequality. 


Lemma 2 (Kronecker Lemma). If bie Xn converges to s (finite) and b, t оо, 
then 


n 
bi! Уы > 0. 
К=1 


Proof. Writing bo = 0, ак = by — by 1, and Sn41 = Ek- Xk, we have 
1< 1-57 
p. J bexx m Y be sk — Sk) 
n k=l n k=l 
1 - ig 
= — | bnSn41 + bk-isk | — — bkSk 
p, (Риз 2 i » 2. 


1 п 
Е У — bk-1)sk 
п к 


1 п 
= Sn41 — J У asr. 
bn 
k=1 


It therefore suffices to show that b; 1 Ya GkSy — 5. Since s, — s, there exists an 
no = no(e) such that 
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€ 
ls = s| < 2 for n > no. 


Since b, t оо, let nı be an integer > no such that 


no 
b7! ув ьа) — 5)| < 5 forn > n. 


Writing 
п 
т = b; ' y (by — br-1)5k, 
k=1 
we see that 


n 


Sih — bk-1)(sk — 5) 


k=l 


1 
Im = s| = — 
n 


, 








and choosing n > nj, we have 


] 
Irn = S| < ea < е. 
п 


У) O-b 


К=по+1 


1 < 
5; У — bx-D Gi = 5) 


" k=1 














This completes the proof. 


Theorem 4. If 37? | var(X,) < оо, then У(Х, — EX4) converges almost 
surely. 


Proof. Without loss of generality, assume that EX, = 0. By Kolmogorov's 
inequality, 


1 п 
Р | max Sm+k — Sm| > 1 ER 2 а(н). 


Letting n — oo, we have 


P {maxim — Sml = e) =P | max |Sk — S41 > 1 
kzi k>m+1 
1 оо 
< zZ } уаг(Х). 


k=m+1 


It follows that 
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„Пт Р {max Is - Sml < e = 1, 


and since є > 0 is arbitrary, we have 





oo 
P | за, 2: Х| = 
j=m 


Consequently, fa X j converges a.s. 





As a corollary we get a version of the SLLN for nonidentically distributed RVs 
which subsumes Theorem 2. 


Corollary 1. Let {X,,} be independent RVs. If 





oo 
X 
у k) < oo, B, t оо, 


2 
ki В; 
then 


Sn bM ES, a.s. 
n 


0. 
Bn 
The corollary follows from Theorem 4 and the Kronecker lemma. 


Corollary 2. Every sequence {X,,} of independent RVs with uniformly bounded 
variances obeys the SLLN. 


If var(X;) < A for all k, and B, = k, then 


gc 


а 


TMe 
Ans 


and it follows that 


Sn — ES, A5 0. 
n 


Corollary 3 (Borel’s Strong Law of Large Numbers). For a sequence of 
Bernoulli trials with (constant) probability p of success, the SLLN holds (with 
— n and A, — np). 


Since 
EX,—p,  va(X)-pü-psi O«p«l 


the result follows from Corollary 2. 
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Corollary 4. Let {X,,} be iid RVs with common mean џи and finite variance о?. 


Then 


5 
P | tim aen 
п->со n 


Remark 1. Kolmogorov’s SLLN is much stronger than Corollaries 1 and 4 of 
Theorem 4. It states that if (X,,) is a sequence of iid RVs, then 


п^1$„ 25 р <=> E|Xi| < оо, 


and then и = ЕХ. The proof requires more work and will not be given here. We 
refer the reader to Billingsley [5], Chung [14], Feller [23], or Laha and Rohatgi [56]. 


PROBLEMS 6.4 


1. For the following а of independent RVs does the SLLN hold? 

(a) P{X, = +2} =} 

(b P(X, = k} = TY: P(X, = 0} = 1— (1/41). 

(c) P(Xy = +26} = 1/227, P(x, = 0} = 1 — (4/275. | 

2. Let X, X2,... be a sequence of independent RVs with 24-4 уаг(Хр)/ k? < оо. 
Show that 


1 п 
=: Уха) > 0 asn — oo. 


Does the converse also hold? 


3. For what values of a does the SLLN hold for the sequence 
Р{Хк = $k} = 1? 


4. Let (02) be a sequence of real numbers such that уса % 2/k? = oo. а 
that there exists а sequence of independent RVs {Хк} with var(X;) = оў k= 
1,2,... , such that n^! Ук (Xy — ЕХ) does not converge to 0 almost surely. 
[Hint: Let Р{Х = +k} = o; 2 12k*, P(X, = 0} = 1 – (o/k?) if o,/k < 1, and 
P(Xy = toy] = 5 if o,/k > 1. Apply the Borel-Cantelli lemma to {|Х„| > n].] 


5. Let X, be a sequence of iid RVs with E|X,| = +оо. Show that for every positive 
number А, P(|X,| > nA i.o.] = 1 and P([S4| < nA io.) = 1. 


6. Construct an example to show that the converse of Theorem 1(a) does not hold. 


7. Investigate a.s. convergence of (X, to 0 in each case. (X,'s are independent in 
each case.) 
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(a) P(X, =e") = 1/п2, P(X, = 0) =1—1/n?. 
(b) P(X, =0) =1—1/n, Р(Х, = +1) = 1/Qn). 


6.5 LIMITING MOMENT GENERATING FUNCTIONS 


Let X4, X2,... be a sequence of RVs. Let Е, be the DF of X4, n = 1,2,..., and 
suppose that the МСЕ М, (t) of Е, exists. What happens to Mn (t) as n — oo? If it 
converges, does it always converge to an MGF? 


Example І. Let {Xn} be a sequence of RVs with PMF Р(Х, = п} = 1, л = 
1,2,.... We have 


M,(t) = Ee!X» = е" +0 asn—> oo forallt > 0, 


and 
M,(t) — +оо forallt «0, and М„()-—> 1 att=0. 
Thus 
0, t>0 
M, > M(t) = $1, t=0 asn — oo. 
со, 1<0 


But M(t) is not ап МСЕ Note that if F, is the DF of X, then 


0 if x « —n 
Е, (х) = | К йге, — F(x)-1 for all x, 


and F is not a DF. 


Next suppose that X, has MGF М, and Xp. 5 X, where Х is an RV with MGF 
М. Does M,(t) > M(t) as n — oo? The answer to this question is in the negative. 


Example 2 (Curtiss [18]). Consider the DF 


0, x < —п, 
Е, (х) = i + C tan"! (nx), —n<x <n, 
1, х>п, 


where c, = 1/[2 ап! (n2)]. Clearly, as n — оо, 


0, x <0, 


Е,(х) > F(x) = | 
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at all points of continuity of the DF F. The MGF associated with F, 15 


п 
M,(t) = J Cnet? — 


ag: 
as 1n 


which exists for all t. The MGF corresponding to F is M(t) — 1 for all t. But 
M,(t) -^ M (t), since M,(t) > оо if t Æ 0. Indeed, 


n It}>x3 n 
M, > | e —— ——— dx. 
a) Í " 6 1+п?х?°^ 


The following result is a weaker version of the continuity theorem due to Lévy 
and Cramér. We refer the reader to Lukacs [68, p. 47], or Curtiss [18], for details of 
the proof. 


Theorem 1 (Continuity Theorem). Let { Е, } be a sequence of DFs with corre- 
sponding MGFs {Mn}, and suppose that M, (t) exists for |t| < to for every n. If there 
exists a DF F with corresponding МСЕ M which exists for |t] < ti < fo, such that 


M,(t) > M(t) аз п — oo for every t € [—t, ty], then Е, — Е. 


Example 3. Let X, be an RV with PMF 


1 1 
Р{Х„=1}=-—, P(X,—-0]—1-—-. 
n n 
Then M, (t) = (1/n)e! --[1— (1/n)] exists for allt є R, and M, (t) > lasn > oo 


for all t. Here M(t) = 1 is ће MGF of an RV X degenerate at 0. Thus X, E X. 


Remark 1. The following notation on orders of magnitude is quite useful. We 
write x, = o(rn) if given € > 0, there exists an N such that |x,/r,| < £ for all 
n > N, and x, = O(r,) if there exists an № and a constant с > 0, such that 
|Xa/ral < c for all n > М. We write x, = O(1) to express the fact that x, is 
bounded for large п, and x, = o(1) to mean that x, — О as n — оо. 

This notation is extended to RVs in an obvious manner. Thus X, = op(rz) if, for 
every е > O and à > 0, there exists an № such that P(|X,/r,| < 8) > 1 — e for 
n > N, and X, = Op(tn) if for e > 0, there exists a c > 0 and an N such that 


P(X,/ra] < c) 2 1 — £. We write X, = op (1) to mean X, —> 0. 
The following lemma is quite useful in applications of Theorem 1. 


Lemma 1. Let us write f(x) = o(x), if f(x)/x — О аз x > 0. We have 


1 п 
lim | + £ +o G) = е“ for every real а. 
п->со п п 
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Proof. By Taylor's expansion we have 


f(x) = РО) + xf'(0x) 
= fO + xf (O + Lf/(0x) — f'(0))x, 0 «90 «1. 


If f'(x) is continuous at x = 0, then as x > 0, 
f(x) = FO) + xf (0) + o(x). 


Taking f(x) = 1ор(1 + x), we have f'(x) = (1+ x)7!, which is continuous at 
x = 0, so that 


log(1 +x) =x + o(x). 


Then for sufficiently large n, 
a 1 a 1 a 1 
niog]1+—-—+o0{[—]|=n;y—-+o0[-]+o0|/- +0] - 
n n n n n n 
(5) 
=а+по| – 
п 


= a +0(1). 


(| = etw 
n n 5 


Example 4. Let X1, Хә, ... be iid b(1, р) RVs. Also, let $, = Di Кк, and let 
M, (t) be the MGF of Sn. Then 


It follows that 


as asserted. 


M,.(t) = (9 + pe)" for all г, 


where q = 1 — p. If we letn — оо in such a way that np remains constant at А, say, 
then, by Lemma 1, 


5 A t ^ À 1 " t 
M,(t) = [1- Е +-e =!11+-(e —1)| — ехр[А(е — D] for all t, 
n n 


which is the MGF of a P(A) RV. Thus, the binomial distribution function approaches 
the Poisson DF, provided that n — oo in such a way that np = А > 0. 


Example 5. Let X ~ P(A). The MGF of X is given by 


M (t) = exp[A(e’ — 1)] for all t. 
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Let Y = (X – А)/ УА. Then ће МОЕ of Y is given by 


t 
My(t) = e M (=) 
JA 
Also, 
t 
log My (t) = —t V/A + log M (=) 
з SAUL 
= А+ A (ll — 1) 
t e p 
= —і А А — — ——— Р: 
VA (5 "Ai 3pm © ) 
t? 15 
rp aut 
It follows that 
г? 
log Му (t) > 3 as А — oo, 


so that My (t) > е! /? as А — oo, which is ће MGF of an A/(0, 1) RV. 


For more examples, see Section 6.6. 


Remark 2. Аз pointed out earlier, working with MGFs has the disadvantage that 
the existence of MGFs is a very strong condition. Working with CFs which always 
exist, on the other hand, permits a much wider application of the continuity theorem. 
Let n be the CF of Fn. Then Е, —, F if and only if $, > $ as n > oo on R, 


where $ is continuous at t = 0. In this case ф, the limit function, is the CF of the 
limit DF F. 


Example 6. Let X be a C(0, 1) RV. Then its CF is given by 











I of cosfx , sin tx 
E exp(itX — 
оре т] 12314 нт ma 
= f. E dx = e t 
-o 1 +x? 


since the second integral on the right side vanishes. 
Let (X,] be iid RVs with common law £(X), and set Y, = jel Xj/n. Then 
the CF of У, is given by 
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фһ(@) = E exp ( у, ч) = [ [exp (-À) 
= exp(- Itb 


for all n. It follows фу is the CF of a C(1, 0) RV. We could not have derived this result 
using MGFs. Also, if Un = a= Xj/n* for a > 1, then 





gu, (t) = exp (- A) 1 


п 


as n — оо for all t. Since p(t) = 1 is continuous at t = 0, g is ће CF of the limit 


DF F. Clearly, F is the DF of an RV degenerate at 0. Thus 2 Xj/n* Su ; 
where P(U — 0) = 1. 


PROBLEMS 6.5 


1. Let X ~ N B(r; p). Show that 
L 
2pX >Y as p — 0, 
where Y ~ x?Qr). 
2. Let X, ~ NB(rn; 1 — ра), n = 1,2,.... Show that X, Ax aS r4, — oo, 
Pn — 0, in such a way that r, p, — А, where X ~ P(A). 
3. Let Ху, X2,... be independent RVs with PMF given by P(X, = +1} = 1, 
n — 1,2, .... Let Zn = У", Xj/2/. Show that Z, 5 Z, where Z ~ U[-1, 1]. 


4. Let {Xn} be a sequence of RVs with X, ~ G(n, B) where B > 0 is a constant 
(independent of n). Find the limiting distribution of X,,/n. 


5. Let X, ~ x?(n), n = 1,2, .... Find the limiting distribution of X, /n?. 
6. Let X1, X2, ... , X, be jointly normal with EX; = 0, EX? = 1 for all i and 


cov(X;, Xj) = p, i, j = 1,2,... (i Æ j). What is the limiting distribution of 
n^ Sn, where Sn = У Xe? 


6.6 CENTRAL LIMIT THEOREM 


Let X4, X2,... be a sequence of RVs, and let $, = Уу у, Xi, n = 1,2,.... 
In Sections 6.3 and 6.4 we investigated the convergence of the sequence of RVs 
B, 1 (S, — An) to the degenerate RV. In this section we examine the convergence of 
B, 105, — An) toa nondegenerate RV. Suppose that for a suitable choice of constants 
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Аһ and B, > 0, the RVs B; 105, — An) E Y. What are the properties of this 
limit RV Y? The question as posed is far too general and is not of much interest 
unless the RVs X; are suitably restricted. For example, if we take X1 with DF F and 
X2, Хз, ... to be 0 with probability 1, choosing A, = 0 and В, = 1 leads to F as 
the limit DF. 

We recall (Example 6.5.6) that if X1, X2, ... , Xn are iid RVs with common law 
C(1, 0), then n^! S, is also C(1, 0). Again, if X1, X2, ... , Xn are iid (0, 1) RVs 
then л^!/ 25, is also A (0, 1) (Corollary 2 to Theorem 5.3.22). We note thus that for 
certain sequences of RVs there exist sequences A, and B, > 0, B, — oo, such that 


B, (Sp — An) 4 Y. In the Cauchy case В„ = n, An = 0, and in the normal case 
B, = n'/2, A, = 0. Moreover, we see that Cauchy and normal distributions appear 
as limiting distributions—in these two cases, because of the reproductive nature of 
the distributions. Cauchy and normal distributions are examples of stable distribu- 
tions. 


Definition 1. Let Х|, X? be iid nondegenerate RVs with common DF F. Let aj, 
аз be any positive constants. We say that F is stable if there exist constants А and B 
(depending on a1, аз) such that the RV B^! (a1 X, + а: Хә — A) also has the DF F. 


Let X1, X2,... be iid RVs with common DF F. We remark without proof (see 
Loéve [64, p. 339]) that only stable distributions occur as limits. To make this state- 
ment more precise, we make the following definition. 


Definition 2. Let X 1, X2,... be iid RVs with common DF F. We say that F be- 
longs to the domain of attraction of a distribution V if there exist norming constants 
B, > 0 and centering constants A, such that as n — oo, 


(1) P(B, (Sy — An) E x} > V(x) 
at all continuity points x of V. 


In view of the statement after Definition 1, we see that only stable distributions 
possess domains of attraction. From Definition 1 we also note that each stable law 
belongs to its own domain of attraction. The study of stable distributions is beyond 
the scope of this book. We restrict ourselves to seeking conditions under which the 
limit law V is the normal distribution. The importance of the normal distribution in 
Statistics is due largely to the fact that a wide class of distributions F belongs to the 
domain of attraction of the normal law. Let us consider some examples. 


Example 1. Let X1, X2, ... , Xn be iid b(1, p) RVs. Let 





n 
S=) X, and А, = ES, — np, В, = Vvar(Sn) = V/np(1 — p). 
k=} 
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Then 
Sn — np | 
М„( = E Som AE 
ee ra =» 


р 
= Пе rt x 
= ex I-—— || + ex =|! —1- 
тр 1 "ут Пр, * 
E LPS а үр 
i [4 exp ( a) ER (45) 
sli г L) A 


It follows from Lemma 6.5.1 that 





2 
M,,(t) — e? asn — oo, 


and since е” /? is ће MGF of an M (0, 1) RV, we have by the continuity theorem 





_ Xx 
|= пр <=} > -= 1 = | ldt forallx €R. 


Example 2. Let X1, X2, ... , Xn be iid x?(1) RVs. Then Sn ~ х2 (п), ES, =n, 
and уаг($„) = 2n. Also let Z, = (S, — n)/V2n; then 


M, (t) = Ee?" 


= exp (5) ( — zx. 2t < 2n, 
= Е (E) - 1/2 exp (4). (545 


Using Taylor's approximation, we get 


2 5 p mr 1 P) 
e(a) =r + 2) ЖЕ "e E 


Is 
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where 0 < 0, < t./2/n. It follows that 


2 —n/2 
мо (1-54 2) , 
п п 
EM a 2H 
t(n) = е2 + (5 TRUE ехр(0,) > 0 asn — oo, 


for every fixed t. We have from Lemma 6.5.1 that M,(t) > e"? as n — oo for all 
real t, and it follows that Z, 4z ‚ where Z is N (0, 1). 


where 


These examples suggest that if we take iid RVs with finite variance, and take 
Án = ES,, B, = var(S,,), then By (S, — An) 5 Z, where Z is N (0, 1). This 
is the central limit result, which we now prove. The reader should note that in both 
Examples 1 and 2, we used more than just the existence of E| X (2. Indeed, the МСЕ 
exists and hence moments of all order exist. The existence of MGF is not a necessary 
condition. 


Theorem 1 (Lindeberg-Lévy Central Tint Theorem). Let (X,) be a se- 
quence of iid RVs with O < var(X,) = c? « oo and common mean и. Let 
Sn = j=, Xj п = 1,2,.... Then for every x € R, 


MES sms s| = zf e T du 


Proof. The proof we give here assumes that the MGF of X, exists. Without loss 
of generality, we also assume that E X, = 0 and var(X,) = 1. Let M be the MGF of 
Xn. Then the MGF of S, /./n is given by 


M,(t) = E exp (2) = [м (Ж) 





апа 
In М„() =n In M(t/A/n) = Dt 
_ Lala) 
se 


where L(t/./n) = In M(t/ An). Clearly, L(0) = In(1) = 0, so that as n — oo, the 
conditions for L’ Hospital's rule are satisfied. It follows that 


L'(t t 
im RM) СЛ, 
noo noo 2/ /n 
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and since L'(0) = EX = 0, we can use L'Hospital's rule once again, to get 


L'üfAn)? _1? 
SS 


lim In M,(t) = lim 
noo noo 2 


using L"(0) = var(X) = 1. Thus 
t? 
Mn (t) — exp ope M(t) 


where M (t) is Ње MGF of a V(0, 1) RV. 


Remark 1. In the proof above we could have used the Taylor series expansion of 
M to arrive at the same result. 


Remark 2. Even though we proved Theorem 1 for the case when the MGF of 
Xn’s exists, we will use the result whenever 0 < ЕХ? = o? « oo. The use of 
CFs would have provided a complete proof of Theorem 1. Let ф be the CF of Xn. 
Assuming again, without loss of generality, that E X,, — 0, var(X,) — 1, we can 
write 


$() = 1— М? + oC). 
Thus the CF of S,,/./n is 


бү E yot P 
—— = | I ~ —t —o(1 
|^ (5)] | att z | 
which converges to exp(—t? /2), which is the CF of a A/ (0, 1) RV. The devil is in the 
details of the proof. 


The following converse to Theorem 1 holds. 


Theorem 2. Let X1, X2,... , X, be iid RVs such that n !/? $, has the same dis- 
tribution for every n = 1,2,.... Then, if E X; = 0, var(X;) = 1, the distribution of 
Xj must be V (0, 1). 


Proof. Let Е be the DF of n~!/25,. By the central limit theorem, 


lim P{n7!/?5, < x} = (x). 
n—oo 


Also, P(n-!? S, < x} = F(x) for each n. It follows that we must have F(x) = 
P(x). 


Example 3. Let Х\, X2,... be iid RVs with common PMF 


Р{Х =k} = p(1— p, k=0,1,2,..., 0<р<1, q-1- p. 
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Then EX = q/p, var(X) = q/ p°. By Theorem 1 we see that 


|> —n@/P) „ 
Jnq 


Example 4. Let X1, X2,... be iid RVs with common B(a, В) distribution. Then 


<х| э Фо) as n — оо forall x є R. 


a = ap 
a.p 0 "OT OT Ba ВАТ): 





By the corollary to Theorem 1, it follows that 


$-nle/e-*5) __ |, 


y afn/[(o + B + 1)(a + 8)?] 


where Z is AN (0, 1). 


For nonidentically distributed RVs we state, without proof, the following result 
due to Lindeberg. 


Theorem 3. Let X1, X2,... be independent RVs with DFs Fi, F5, ... , respec- 
tively. Let E X, = и and var(X4) = of, and write 


n 
=) 2 
Sa >= oj: 

ja 


If the F,’s are absolutely continuous with PDF f, assume that the relation 
n 


Q) in f odd hodt 
Ix uk] £55 


noo 52 kel 


holds for all є > 0. (A similar condition can be stated for the discrete case.) Then 


Ер Cr Dre 
(3) s = iX Mia ку уор, 


Sn 


Condition (2) is known as the Lindeberg condition. 


Feller [21] has shown that condition (2) is necessary as well in the following 
sense. For independent RVs {Хк} for which (3) holds and 


P| max |X = EX;| = ev/var(S;)| > 0, 
<k<n 


(2) holds for every ғ > 0. 


CENTRAL LIMIT THEOREM 299 
Example 5. Let X1, X2,... be independent RVs such that X, is U (а, ак). 


Then EX, = 0, var(X;) = (1/3)а?. Suppose that jaz] < a and 3 аў — oo as 
n — oo. Then 


1 < 1 
2 2 
5 з J: Аб) ах < 2 J a Ja. dx 


Sn k= 1 ses, ” k=l ixl» 855 
2 n 
a var( Xy) 
< = У РИХИ > єз} < Sb 2:2 
S, E Sh 
n k=l 
a? 


50 asn — oo. 
ersi 


If У а? < oo, then 52 1 А?, say, as п > oo. For fixed k, we can find є; such 
that e,A < ay and then P(|X4| > EkSn} > РХ > €x A) > 0. Forn > k, we have 


5222 п 
2M E x fi(x)dx > БЕУ” Р(Х] > екз) 
n j-l 


iesus, 
> є{Р{|Хк| > ел) 
> 0, 


so that the Lindeberg condition does not hold. Indeed, if X4, X2,... аге indepen- 
dent RVs such that there exists a constant A with P{|X,| < A} = 1 for all n, the 
Lindeberg condition (2) is satisfied if s? — oo as п — оо. To see this, suppose that 
52 — oo. Since the X,'s are uniformly bounded, so are the RVs X, — ЕХ. It follows 
that for every € > 0 we can find an №, such that for n > Ne, РХ; — ЕХ| < Esn, 
k — 1,2,...,n] — 1. The Lindeberg condition follows immediately. The converse 
also holds, for if lim, оо 52 < оо and ће Lindeberg condition holds, there exists а 
constant A < oo such that 52 — A?. For any fixed j, we can find an ғ > 0 such that 
РХ; — иу > £A} > 0. Then, forn > j, 


п 
5 22 | «млода > Y PI — al > esn) 
lix- Hk\>ESn = 
-£2P(X;—puj > є А} 
> 0, 


and the Lindeberg condition does not hold. This contradiction shows that 52 — oo is 
also a necessary condition; that is, for a sequence of uniformly bounded independent 
RVs, a necessary and sufficient condition for the central limit theorem to hold is 


52 — ooasn — oo. 
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Example 6. Let X1, X2,... be independent RVs such that oy = E|X4|?*? < oo 
for some à. > О anda; +a2+---+a, = o(s2*). Then the Lindeberg condition 
is satisfied, and the central limit theorem holds. This result is due to Lyapunov. We 
have 


1 n 
22. 


n k—l 


1 n 99 
J x fil) dx < у У? f 2+0 f(x) dx 
n k=1 "-99 


|x| >ESn 


n 


gô 52+8 


asn — oo. 
A similar argument applies in the discrete case. 

Remark 3. Both the central limit theorem (CLT) and the (weak) law of large 
numbers (WLLN) hold for a large class of sequences of RVs {X,,}. If the {Х„} are 
independent uniformly bounded RVs, that is, if P{|X,| < M} = 1, the WLLN 
(Theorem 6.3.1) holds; the CLT holds provided that s? — oo (Example 5). 


If the RVs {X,,} are iid, then the CLT is a stronger result than the WLLN in that 
the former provides an estimate of the probability P{|S, — nu{/n > =}. Indeed, 


P(S, — пу] > ne) = p| 2 E m 
~1-P{iz\< eva]. 
oO 


where Z is N (0, 1), and the law of large number follows. On the other hand, we note 
that the WLLN does not require the existence of a second moment. 


Remark 4. If {Xn} are independent RVs, it is quite possible that the CLT may 
apply to the X,,’s, but not the WLLN. 


Example 7 (Feller [22, p. 255]). Let {Хк} be independent RVs with PMF 
P(X, = №) = Р(Хь = К) =}, k=1,2,.... 


Then E X, = 0, var(X4) = k?*. Also let 4 > 0; then 


n n+l 1)2^+! 
a n с dy = PEDT 
Sn 25i <f ER TDA 


It follows that if 0 < A < $, s,/n — 0, and by Corollary 2 to Theorem 6.3.1, the 
WLLN holds. Now КА < n^, so that the sum 374. Уез, X4 Pki Will be nonzero 
if n^ > esn © eln?t!/2/ / (2X + 1)]. It follows that as long as n > (2А + 1)e7?, 
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1 n 
ZÈ X dm- 


п k—1 |хи|> 65, 


and the Lindeberg condition holds. Thus the CLT holds for A > 0. This means that 


[2x 1 ье—/2 еа. 
pfa < A — $4 < j^ >| = 


anèt?!/2-1 5, bpt! 27d f en 
PY ————_ < — < ———— }} > dt 
„А+ 1 п „2А + 1 a ~n 


Thus 





and the WLLN cannot hold for A > і. 


We conclude this section with some remarks concerning the application of the 
CLT. Let X1, X2,... be iid RVs with common mean и and variance c?. Let us write 


_ 5. np 


on ' 


and let 21, 22 be two arbitrary real numbers with zı < 22. If F, is the DF of Z,, then 


Zn 


lim Р{2 < Z, € z2} = lim [Fs(z2) — Ё. (21)] 
noo п-» оо 


1 
Е м 2л 21 


2 2 
et 7? dt, 


that is, 
1 22 —{? 2 
(4) lim Pizjo n пи < Sn < z20./n +пи} = =| e /? dt. 
оо М2х Zi 


It follows that the RV S, = Уру Xx is asymptotically normally distributed (see 
Section 7.5) with mean ny and variance no. Equivalently, the RV n^! $, is asymp- 
totically Л (и, o0? /n). This result is of great importance in statistics. 

In Fig. 1 we show the distribution of X in sampling from P(A) and С(1, 1). 
We have also superimposed, in each case, the graph of the corresponding normal 
approximation. 

How large should п be before we apply approximation (4)? Unfortunately, the an- 
swer is not simple. Much depends on the underlying distribution, the corresponding 
speed of convergence and the accuracy one desires. There is a vast amount of liter- 
ature on the speed of convergence and error bounds. We will content ourselves with 
some examples. The reader is referred to Rohatgi [88] for a detailed discussion. 


302 


LIMIT THEOREMS 


0.8 











Approximation 


Exact density 


(b) 


Fig. 1. (a) Distribution of X for Poisson RV with mean 3 and normal approximation; 
(b) distribution of X for exponential RV with mean 1 and normal approximation. 


In the discrete case when the underlying distribution is integer valued, approxi- 
mation (4) is improved by applying the continuity correction. \f X is integer valued, 
then for integers x1, x2 


Рх < X <x} = Pla — 4 < X <x. +4} 
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which amounts to making the discrete space of values of X continuous by consider- 
ing intervals of length 1 with midpoints at integers. 


Example 8. Let X1, X2,...,Xn be iid b(1, p) RVs. Then ES, = np, and 
var(S,) = np(1 — p), so (Sn — np)/ /np(1 — p) is approximately A (0, 1). 

Suppose that 1 — 10, p — 1. Then from binomial tables, P(X < 4) = 0.3770. 
Using normal approximation without continuity correction, 


4—5 
Р(Х <4)xP (z < ==. = P(Z < —0.63) = 0.2643. 
‹ A 2.5 | 


Applying continuity correction, 
P(X x 4) = Р(Х < 4.5) © P(Z x —0.32) = 0.3745. 


Next suppose that n = 100, р = 0.1. Then from binomial tables P(X = 7) = 
0.0889. Using normal approximation, without continuity correction, 


Р(Х — 7) = Р(6.0 < X < 8.0) ~ P(-1.33 < Z < —0.67) 
= 0.1596 
and with continuity correction 
Р(Х =7) = P(65« X «7.5 ~% P(-1.17 < Z < —0.83) 
== 0.0823 


The rule of thumb is to use continuity correction, and normal approximation when- 
ever np(1 — р) > 10, and Poisson approximation with А = np for p < 0.1, A < 10. 


Example 9. Let Ху, X2,... be iid P(A) RVs. Then $, has approximately an 
A (nA, nd) distribution for large n. Let п = 64, А = 0.125. Then S, ~ P(8), 
and from Poisson distribution tables, P(S, = 10) = 0.099. Using normal approxi- 
mation, 


P(S, = 10) = P(9.5 « $, « 10.5) & P(0.53 « Z « 0.88) 
— 0.1087. 


If n = 96, A = 0.125, then S, ~ P(12) and 
P(S, = 10) = 0.105, exact, 
P(S, = 10) ~ 0.1009, normal approximation. 


PROBLEMS 6.6 


1. Let (X4] be a sequence of independent RVs with the following distributions. In 
each case, does the Lindeberg condition hold? 
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(a) Р{Х„ = £(1/2")} = 2. 

(b) P{X, = +27+1} = 1/279, P{X, = 0} = 1— (1/2"*?). 

(c) P(X, = +1} = (0—277)/2, P(X, = 42-7} = 1/27. 

(d) {Xn} is a sequence of independent Poisson RVs with parameter Ал, п = 
1,2,..., such that Уу А > оо. 


(e) P(X, = +2") = 1. 


. Let X1, X2,... be iid RVs with mean 0, variance 1, and ЕХ? < oo. Find the 


limiting distribution of 


X 1X2 + XaXa +--+ + Хәһ-1Хәп 


Zn = Мп 


. Let X4, Хә, ... be iid RVs with mean о and variance о?, and let Yi, У, ... be 


iid RVs with mean В (# 0) and variance т?. Find the Jimiting distribution of 
Zn = /п(Хһ — a)/Y,, where X, = n^ У у Xi and Ya = n7! У Yi. 


. Let X ~ b(n,0). Use the CLT to find n such that Ро{Х > n/2} > 1 — о. In 


particular, let о = 0.10 and Ө = 0.45. Calculate п, satisfying Р(Х > n/2} > 
0.90. 


. Let X1, X2,... be a sequence of iid RVs with common mean jz and variance 


c?. Also, let X = n^! У) у Хк and 52 = (n — 7! У(Х; — Х)?. Show 
that /n(X — и)/5 5 Z, where Z  N (0, 1). 


. Let X1, X2,... , X100 be iid RVs with mean 75 and variance 225. Use Cheby- 


chev's inequality to calculate the probability that the sample mean will not differ 
from the.population mean by more than 6. Then use the CLT to calculate the 
same probability, and compare your results. 


. Let X1, X2, ... , X100 be iid P(A) RVs, where A = 0.02. Let S = Sio = 


у X;. Use the central limit result to evaluate Р{$ > 3}, and compare your 
result to the exact probability of the event 5 > 3. 


. Let X4, X2,... , Xgj be iid RVs with mean 54 and variance 225. Use Cheby- 


chev’s inequality to find the possible difference between the sample mean and 
the population mean with a probability of at least 0.75. Also use the CLT to do 
the same. 


. Use the CLT applied to a Poisson RV to show that іт, оо е" E (nt) /k! = 


lforü «t <1,= żift = Ll andO ift > 1. 

Let Ху, X2,... bea sequence of iid RVs with mean jz and variance c?, and as- 
sume that E X i < oo. Write V, = 37 (Xe- 11). Find the centering and norm- 
ing constants Ap and B, such that B, (V, — An) = Z, where Z is Л/(0, 1). 


From an urn containing 10 identical balls numbered 0 through 9, n balls are 
drawn with replacement. 
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(a) What does the law of large numbers tell you about the appearance of 0° in 
the n drawings? 

(b) How many drawings must be made in order that with probability at least 
0.95, the relative frequency of the occurrence of 0’s will be between 0.09 
and 0.11? 

(c) Use the CLT to find the probability that among the n numbers thus chosen, 
the number 5 will appear between (n — 34/n)/10 and (n + 34/n)/10 times 
(inclusive) if (1) n — 25, and (ii) n — 100. 

12. Let X1, X2, ... , X, be iid RVs with EX; = 0 and EX? = o? < oo. Let X = 

Yi) Xi/n, and for any positive real number ғ, let Р„ = Р(Х > є}. Show that 





с l| 252 
Pae © е7" 26 


ENN /2л 


asn — oo. 


[Hint: Use (5.3.61).] 


CHAPTER 7 


Sample Moments and 
Their Distributions 


7.4 INTRODUCTION 


In the preceding chapters we discussed fundamental ideas and techniques of prob- 
ability theory. In this development we created a mathematical model of a random 
experiment by associating with it a sample space in which random events corre- 
spond to sets of a certain c -field. The notion of probability defined on this ø -field 
corresponds to the notion of uncertainty in the outcome on any performance of the 
random experiment. 

In this chapter we begin the study of some problems of mathematical statistics. 
The methods of probability theory learned in preceding chapters are used extensively 
in this study. Suppose that we seek information about some numerical characteristics 
of a collection of elements, called a population. For reasons of time or cost we may 
not wish or be able to study each element of the population. Our object is to draw 
conclusions about the unknown population characteristics on the basis of information 
on some characteristics of a suitably selected sample. Formally, let X be a random 
variable that describes the population under investigation, and let F be the DF of X. 
There are two possibilities. Either X has a DF Fg with a known functional form 
(except perhaps for the parameter 0, which may be a vector), or X has a DF F about 
which we know nothing (except perhaps that F is, say, absolutely continuous). In the 
former case let © be the set of possible values of the unknown parameter Ө. Then 
the job of a statistician is to decide on the basis of a suitably selected sample which 
member or members of the family (Fo, 0 € ©} can represent the DF of X. Problems 
of this type, called problems of parametric statistical inference, are the subject of 
investigation in Chapters 8 through 12. The case in which nothing is known about the 
functional form of the DF F of X is clearly much more difficult. Inference problems 
of this type fall into the domain of nonparametric statistics and are discussed in 
Chapter 13. 

To be sure, the scope of statistical methods is much wider than the statistical infer- 
ence problems discussed in this book. Statisticians, for example, deal with problems 
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of planning and designing experiments, of collecting information, and of deciding 
how best the collected information should be used. However, here we concern our- 
selves only with the best methods of making inferences about probability distribu- 
tions. 

In Section 7.2 we introduce the notions of a (simple) random sample and sample 
statistics. In Section 7.3 we study sample moments and their exact distributions, and 
in Section 7.5 we consider their large-sample approximations. In Section 7.4 we con- 
sider some important distributions that arise in sampling from a normal population. 
Sections 7.6 and 7.7 are devoted to the study of sampling from univariate and bivari- 
ate normal distributions. 


7.2 RANDOM SAMPLING 


Consider a statistical experiment that culminates in outcomes x, which are the values 
assumed by an RV X. Let F be the DF of X. In practice, F will not be completely 
known; that is, one or more parameters associated with F will be unknown. The 
job of a statistician is to estimate these unknown parameters or to test the validity 
of certain statements about them. She can obtain n independent observations on 
X. This means that she observes n values x1, x2,... , x, assumed by the RV X. 
Each x; can be regarded as the value assumed by an RV Xi, i = 1, 2,..., п, 
where X1, X2,... , Xn are independent RVs with common DF F. The observed 
values (x1, X2, ... , X4) are then values assumed by (Xj, X5,... , Xn). The set 
[X1, X2, ... , Xn} is then a sample of size n taken from a population distribution 
Е. The set of п values x1, x2, ... , Xn is called a realization of the sample. Note that 
the possible values of the RV (X1, X2,... , Xn) can be regarded as points in Rp, 
which may be called the sample space. In practice one observes not х], x2, ... , Xn 
but some function f (х1, xo, ... , Xn). Then f(xi, x2, ... , Xn) are values assumed 
by the RV f(X1, X2, ... , Xn). 
Let us now formalize these concepts. 


Definition 1. Let X bean RV with DF F, and let X1, X2, ... , X, beiid RVs with 
common DF F. Then the collection X1, X2, ... , X, is known as a random sample 
of size n from the DF F or simply as n independent observations on X. 


If X1, X2, ... , Xn is a random sample from F, their joint DF is given by 
n 
а) F*(x1,x2,...,4n) = [ [ F@i). 
i=l 


Definition 2. Let Ху, X2,... , X, be n independent observations on an ВУ X, 
and let f : Ran — Rx be a Borel-measurable function. Then ће RV f (X1, X2,..., 
Xn) is called a (sample) statistic provided that it is not a function of any unknown 
parameter(s). 
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Two of the most commonly used statistics are defined as follows. 


Definition 3. Let Х|, X2, ... , X, be a random sample from a distribution func- 
tion F. Then the statistic 
п 
үз Х. 
(2) Х=п!5=УУ — 
C pn 
i=l 
is called the sample mean, and the statistic 
Y 2 „у? 
Q) gos D 2 iu X;-nX 
п—1 n—i 


1 


is called the sample variance, and S is called the sample standard deviation. 


Remark 1. Whenever the word sample is used subsequently, it will mean ran- 
dom sample. 


Remark 2. Sampling from a probability distribution (Definition 1) is sometimes 
referred to as sampling from an infinite population since one can obtain samples of 
any size one desires even if the population is finite (by sampling with replacement). 


Remark 3. In sampling without replacement from a finite population, the inde- 
pendence condition of Definition 1 is not satisfied. Suppose that a sample of size 2 is 
taken from a finite population (a1, a2, ... , ам) without replacement. Let X; be the 
outcome on the ith draw. Then P{X, = aj) = 1/N, Р{Х = а | Ху = ai) = 
1/(N — 1), and P(X? = a? | Ху = аз} = 0. Thus the PMF of X2 depends on 
the outcome of the first draw (that is, on the value of Ху), and X, and X2 are not 
independent. Note, however, that 


N 
Р{Х2 = ар} = X P{X = aj} P(X2 = a2 | ау} 
j=! 


-Y P(X) = ађРІХ = а 1а) == 


1#2 ү 
and Х| Es X2. A similar argument can be used to show that X4, X2, ... , X, all 
have the same distribution but they are not independent. In fact, Xj, X2, ... , Xn are 


exchangeable RVs. Sampling without replacement from a finite population is often 
referred to as simple random sampling. 


Remark 4. It should be remembered that sample statistics X, 52 (and others that 
we will define later) are random variables, while the population parameters џи, o?, 
and so on, are fixed constants that may be unknown. 
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Remark 5. In (3) we divide by n — 1 rather than n. The reason for this will 
become clear in the next section. 


Remark 6. Other frequently occurring examples of statistics are sample order 
statistics X(1), X(2), ... , X(n) and their functions, as well as sample moments, which 
will be studied in the next section. 


Example I, Let X ~ b(1, p), where p is possibly unknown. The DF of X is 
given by 


F(x) = pe(x — 1) + (1 — p)e(x), x єў}. 


Suppose that five independent observations on X are 0, 1, 1, 1, 0. Then 0, 1, 1, 1, 
0 is a realization of the sample X1, X2,... , X5. The sample mean is 


0+1+1+1+0_ 


0.6, 
5 


x= 
which is the value assumed by the RV X. The sample variance is 


= = = 03, 


s? = Y Qi – x)? с 2(0.6)2 + 300.4)? - 


which is the value assumed by ће RV 52. Also s = 4/0.3 = 0.55. 


Example 2. Let X ~ N(u, 07), where p is known but o? is unknown. Let 
Xi, X2, ... , X, be a sample from A/(u, o”). Then, according to our definition, 
377.4 Xi/o? is not a statistic. 

Suppose that five observations on X are —0.864, 0.561, 2.355, 0.582, —0.774. 
Then the sample mean is 0.372, and the sample variance is 1.648. 


PROBLEMS 7.2 


1. Let X bea b(1, 1) КУ, and consider all possible random samples of size 3 on X. 
Compute X and S? for each of the eight samples, and also compute the PMFs of 
X and S?. 


2. A die is rolled. Let X be the face value that turns up, and X1, X» be two inde- 
pendent observations on X. Compute the PMF of X. 


3. Let X1, X5,... , X, be a sample from some population. Show that 


- — 1 
max |X; — Х| < ins 
Ixi xn Мп. 
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unless either all the п observations are equal or exactly n — 1 of the X;’s are 
equal. (Samuelson [97]) 


4. Let x1, x2, ... , x, be real numbers, and let хп) = max(x1, x2, ... , Xn}, xa) = 
min(x;, x2, ... , Xn}. Show that for any set of real numbers 21, a2, ... , an such 
that 3 7 , a; = 0, the following inequality holds: 


n 
) jan 
i=l 


5. For any set of real numbers x1, x2, .. . , Xn, Show that the fraction of x1, x2, .. . , Xn 
included in the interval (x — ks, x + ks) fork > 1 is at least 1 — 1/ k?. Here x 
is the mean and s the standard deviation of x's. 


n 
< im — xa) У lail. 
i=l 








7.3 SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 


Let X1, X2, ... , Xn be a sample from a population DF F. In this section we consider 
some commonly used sample characteristics and their distributions. 


Definition 1. Let Р(х) = n! х= =1 E&x — Xj). Then n F7 (x) is the number of 
Xy's(1 E К <n) that are < x. Р(х) is called the sample (or empirical) distribution 
function. 


We note that 0 < F7 (x) < 1 for all x, and moreover, that F7 is right continuous, 
nondecreasing, and F7 (—oo) = 0, F7 (oo) = 1. Thus FF is a DF. 
If Хор, Xa), ... , Хп) is the order statistic for X1, X2,... , Xn, then clearly 
Кх < Хо 
if Xy) € x < Xin (k—12,...,n— 1) 
if x > XQ. 


0) 0) = 


жзг © 


For fixed but otherwise arbitrary x є R, F7 (x) itself is an RV of the discrete type. 
The following result is immediate. 


Theorem 1. The RV F7 (x) has the probability function 


Q P |o - 4 - (Piron — Fœ,  j=0,1,...,n, 
with mean 
(3) EF œ) = F(x) 


and variance 
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-F 
(4) varzo) = OLT 


Proof. Since e(x — Xj), j = 1, 2,... , n, are iid RVs, each with PMF 
P{e(x — Xj) = 1} = P{x — X; > 0} = F(x) 
and 
P(e(x — Xj) = 0} = 1 — F(x), 


their sum n F7 (x) is a b(n, p) RV, where p = F(x). Relations (2), (3), and (4) follow 
immediately. 


Corollary 1. For each x є К, 
» P 
Fi (х) > F(x) as n — oo. 


Corollary 2. For each x € R, 


МООЛ аа, 
Ро) — FG] 


where Z is Л/(0, 1). 


Corollary 1 follows from the WLLN and Corollary 2 from the CLT. The con- 
vergence in Corollary 1 is for each value of x. It is possible to make a probability 
statement simultaneously for all x. We state the result without proof. 


Theorem 2 (Glivenko—Cantelli Theorem). F*(x) converges uniformly to 
F(x), that is, for € > 0, 


lim P | зир |Е%(х) Fx) > e} == 0. 
п оо 


—00<х «oo 


For a proof of Theorem 2, we refer to Fisz [28, p. 391]. 

We next consider some typical values of the DF Fý (x), called sample statistics. 
Since F7'(x) has jump points X;, j = 1,2,...,n, it is clear that all moments of 
Ек (x) exist. Let us write 


(5) ay =n"! у, x 
j=l 


for the moment of order k about 0. Here a, will be called the sample moment of 
order k. In this notation 


312 SAMPLE MOMENTS AND THEIR DISTRIBUTIONS 
n — 
(6) apeen Ý xy SX. 
j=l 
The sample central moment is defined by 
n n са 

(7) b =n" УХ; а) = п Уху -XF 

j=l j=l 


Clearly, 


n-—i 


by =0 and b= 52. 





As mentioned earlier, we do not call b? ће sample variance. S? will be referred to as 
the sample variance, for reasons that will subsequently become clear. We have 


(8) bı = an — аў. 
For the MGF of DF F7 (x), we have 
n 
(9) M*() = п Y eo. 
j=l 


Similar definitions are made for sample moments of bivariate and multivariate 
distributions. For example, if (X1, Y1), (X2, Y2), ... , (Xn, Yn) is a sample from a 
bivariate distribution, we write 


n n 
(10) X2n!Y X, and Y-n!Y Y, 
for the two sample means, and for the second-order sample central moments we write 


n n 
a1) by —n Y X; - XY, Бо = пг! Y (Y; -YY, and 
j=ì j=1 


n 
by 927 Y (x; -XXY; - Y). 
j=l 
Once again we write 
n - n B 

(12) 52 = (п — 1)7! ux _ х)? and 52 = (п – j^ УО; Lo Y? 

j=l j=l 
for the two sample variances, and for the sample covariances we use the quantity 


(13) $12 (0-07 (Х; - X); - Y). 
j=l 
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In particular, the sample correlation coefficient is defined by 





(14) R= =. 


It can be shown (Problem 4) that |R| < 1; the extreme values +1 can occur only 
when all sample points (X1, Yi), .-. , (Xn, Yn) lie on a straight line. 

The sample quantiles are defined in a similar manner. Thus, if 0 < p < 1, the 
sample quantile of order p, denoted by Zp, is the order statistic X (,), where 


np if np is an integer, 

{np + 1] if np is not an integer. 
As usual, [x] is the largest integer < x. Note that if np is an integer, we can take any 
value between X (np) and X(np)+1 as the pth sample quantile. Thus, if p = 1 and л is 
even, we can take any value between X (2) and X(n/2)+1, the two middle values, as 
the median. It is customary to take the average. Thus the sample median is defined 
as 





X((--0/2) if n is odd, 
Gm es Хал) + Xin) 2) + X30 if п is even. 
2 
Note that 
Easy 
if n is odd. 


Example 1. A random sample of 25 observations is taken from the interval (0,1): 


0.50 0.24 0.89 0.54 0.34 0.89 0.92 0.17 0.32 0.80 
0.06 0.21 0.58 0.07 0.56 0.20 0.31 0.17 0.41 0.38 
0.88 0.61 0.35 0.06 0.90 


In order to compute Fs. the first step is to order the observations from smallest to 
largest. The ordered sample is 


0.06, 0.06, 0.07, 0.17, 0.17, 0.20, 0.21, 024, 0.31, 0.32, 
0.34, 0.35, 0.38, 0.41, 0.50, 0.54, 0.56, 0.58, 0.61, 0.80, 
0.88, 0.89, 0.89, 0.90, 0.92 


Then the empirical DF is given by 
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0 0.2 0.4 0.6 0.8 1 


Fig. 1. Empirical DF for data of Example 1. 


0, x < 0.06 

2/25, 0.06 < x < 0.07 

3/25, 0.07 <х < 0.17 

F&G)-] 5/25. 0.17 <x < 0.20 


24/25, 0.90 < х < 0.92 
1, х > 0.92 


A plot of F5; is shown in Fig. 1. The sample mean and variance аге 
X —045, 52 = 0.084, and 5 = 0.29. 


Also, sample median is ће 13th observation in the ordered sample, namely, 21/2 = 
0.38, and if р = 0.2, then np = 5 and 22 = 0.17. 


Next we consider the moments of sample characteristics. In the following we 
write EX* = тк and E(X — р)“ = up for the kth-order population moments. 
Whenever we use тк (or ug), it will be assumed to exist. Also, в? represents the 
population variance. 


Theorem 3. Let X1, X2,... , X, be a sample from a population with DF F. 
Then 
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(16) EX =u, 
Е о? 
(17) var(X) = —, 
п 
vao тз + 3(п – тәр + (п – 1)(л – 2,2? 
(18) Е(Х)? = асаа EC DE 
and 


ma + 4(n — 1)m3y + 6(n — 1)(n —2)mau? + 3(n — 1)т5 
n3 
p 07 DG - 20-34 


n? 


(19) E(X)* = 





Proof. In view of Theorems 4.5.3 and 4.5.7, it suffices to prove (18) and (19). 
We have 


n 
(5x) -Xu +з XX+ у, ХХХ, 
j=! jzk jk 
and (18) follows. Similarly, 


(Ex) - (sx ) (Zu +3) XX. M ххх) 


iFk [у 


з 


=) Xf +49 XXi +39 ХХ +6 у, х2хух, 
i=l Гр js iz jk 


+ У ххх, 
[ур 
and (19) follows. 


Theorem 4. For the third and fourth central moments of X, we have 


(20) из(Х) = > 
and 

= — ри 
(21) ua = + ge. 
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Proof. We have 
1 Ш 3 
из(®) = E(X — uy = f px - ^| 


і ҳе 3 H3 
= 5 EX - n = 75, 


i=] 


and 
1 " i 
ил(Х) = E(X — и)“ = PS [$o = J 


1 € 4\ 1 

= EK - n + (2) = 2. Eli – Qt – ш)? 
i=l i<j 

_ us 3@—1) » 


ue : 
n? m N2 





Theorem 5. For the moments of b2, we have 


x 2 
22) ЕФ) = woe 
ша ul  2(u4—2u2) | ua —3u$ 
Q3) var(b2) = ————^ — n + 
n п п 
—1)—-2 
(24) E(bs) = ——À | 
апа 
m 2 2 Е = 
(25) E(ba) = (n — 1)(п 3n + Mi 4. 3(1 — D)2n — 3) i. 


n? n? 


NUS RUNE UN EE 
р: и)? — n(X | 


fel 


n—i 
в?. 





E - (na? — o?) = 
п п 
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Now 


ñ 2 
nb = [xw - uy - nX - zl 


i=] 


Writing Y; = X; — и, we see that EY; = 0, var(Y;) = o?, and EY; = u4. We have 
z 2 
п?ЕЬ? = «(Y =) 
1 
h 4 2y2_ 2 2y2 | N^ y4 
= Е 2 +} уу; = л р, 
i= j= 


izj izj 
p 3y Y? a yo? 
п? izj E 1 J | 
ifj 


It follows that 
2рр2 4 2 4 | 4 
n° Eby = пра t n(n — l)o” — aU Ure + пиа] + —[3л(л — Па" + пиз] 
1 3 2 2 
= n-2*- bat mee (n—Du5  (u2z0^5). 


Therefore, 


var(b2) = Eb? — (Eb)? 
1 3\ u ed? 
-(n-242) Ba n(n-242)2-(5 ) 2 
n п п п п 


1 2 
=(%-2+,)&@+@-6-т=, 
п п п 





as asserted. 
Relations (24) and (25) can be proved similarly. 


Corollary 1. ES? = о2. 
This is precisely the reason why we call 52, and not b2, the sample variance. 


3 ps 
Corollary 2. var(5?) — Eid u3. 
n n(n — 1) 


Remark 1. The results of Theorems 3 to 5 can easily be modified and stated for 
the case when the X;'s are exchangeable RVs. Thus (16) holds and (17) has to be 
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modified to 


—1 
п pc? 





E о? 
(17') var(X) = — + 
n n 


where p is the correlation coefficient between X; and Xj. The expressions for 
(XX Р and (EX J in the proof of Theorem 3 still hold, but both (18) and (19) 
need appropriate modification. For example, (18) changes to 


тз + 3(n — 1) E(X7 Xx) + (n — D(n — 2)Е(Х ХХІ) 


n2 


(18^) ЕХ? = 





Let us show how Corollary 1 changes for exchangeable RVs. Clearly, 


(n — 08? = 5 OG — ш)? — n(X и)? 


ї=1 


so that 
(n — DES? = no? - nE(X — ш)? 
= no? — |o? +(n— Dpo?] А 
in view of (17^). It follows that 
ES? = o?(1 — p). 


We note that E(S2 — o?) = —po?, and moreover, from Problem 4.5.19 [or from 
(17/)] we note that o > —1/(n — 1), so that 1 — o < n/(n — 1), and hence 





0< ES? < п а?. 
= ~n—-I 


Remark 2. In simple random sampling from a (finite) population of size N, we 
note that when n = №, X = u, which is a constant, so that (17^) reduces to 





so that o = —1/(N — 1). It follows that 


N — no? 


М—1п` 


2 
- а п = 1 

7" = ш езеш] et 
The factor (N —n)/(N — 1) in (17”) is called the finite population correction factor. 
As N — оо, with n fixed, (М —n)/(N — 1) — 1, so that the expression from var(X) 
in (17^) approaches that in (17). 
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The following result provides a justification for our definition of sample covari- 
ance. 


Theorem 6. Let (X1, Y1), (X2, Yo), ... , (Xn, Yn) be a sample from a bivariate 
population with variances оў, оў and covariance 0002. Then 


(26) ES? =0?, ES =0}, and ESn = p010, 
where S?, 52, and 511 are defined in (12) and (13). 
Proof. Xt follows from Corollary 1 to Theorem 5 that E s? = ad and E 52 = o2. 


To prove that ES; = 00102, we note that X; is independent of X;(i 5 j) and 
Y; (i Æ j). We have 


(п — 1)Е5ү = E pate — XY; - 1 


j=l 
Now 
E((X;-X)(¥j - Y) = E Е = x, Eit = 2210 T кы | 
= E(XY) — LEXY) + (п —1)ЕХЕҮ]— “[E(XY) + (n — 1)EXEY] 
+ gl EQCY) +n(n—1)EXEY) 
n 


шо tex) — EXEY] 
n 





and it follows that 


п — 


(п – 1)Е$у = п 1 Е(ХҮ) — EXEY], 





п 


that is, 
Е = E(XY) — EXEY =cov(X, Y) = роо», 
as asserted. 


We next turn our attention to the distributions of sample characteristics. Several 
possibilities exist. If the exact sampling distribution is required, the method of trans- 
formation described in Section 4.4 can be used. Sometimes the technique of MGF or 
CF can be applied. Thus, if X1, X2, ... , X, is a random sample from a population 
distribution for which the MGF exists, the MGF of the sample mean X is given by 
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n t n 
Q7) My(t) = [Ее = [м E) ; 
А п 
i-i 


where M is the MGF of the population distribution. If Му (г) has one of the known 
forms, it is possible to write the PDF of X. Although this method has the obvious 
drawback that it applies only to distributions for which all moments exist, we will 
see in Section 7.6 its effectiveness in the important case of sampling from a normal 
population where this condition is satisfied. An analog of (27) holds for CFs without 
any condition on existence of moments. Indeed, 


"E 


j=l 
where ф is the CF of X;. 


Example 2. Let X1, X2,... , X, be a sample from a G(a, 1) distribution. We 
will compute the PDF of X. We have 


муш = [uw (7) HO o c Sen 
LE n ^ (0 — t/nyn" moe 


so that X is a G(an, 1/n) variate. 


Example 3. Let Х\, X2,... , Xn be a random sample from a uniform distribution 
on (0, 1). Consider the geometric mean 


га 1/п 
Y, = (А x Я 
i=] 


We have log Y, = (1/n) Num log Xj, so that log Y, is the mean of log X1,... , log Xn. 


The common PDF of log X;,... , log X, is 
e* if x « 0, 
eges И otherwise, 


which is the negative exponential distribution with parameter 8 = 1. We see that the 
MGF of log Y, is given by 


1 


n 
Z E tlog Xi/n __ шол ете 
мо = [|е (14 1/n)n 


i=l 


and the PDF of log Yn is given by 
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п" 


f*(x) = { ro) 


0, otherwise. 





(—xy7le'*, —oo«x <0, 


It follows that Y, has PDF 





n 
n—-l, n=l 
hos toy ТОТЫ ASi 


0, otherwise. 


Example 4 (Hogben [43]. Let Xj, X2,..., X, be a random sample from 
a Bernoulli distribution with parameter p, 0 < p < 1. Let X be the sam- 
ple mean and $? the sample variance. We will find the PMF of 52. Note that 
Sn = Ya Xi = Уу у X? and that S, is b(n, p). Since 


(п - DS? = Y xX? - xy 


i=] 


Syn 5) 
n * 

S? only assumes values of the form 

i(n — i) n 

t= —————-, i=0,1,2,...,1=], 

n(n — 1) | 5] 

where [x] is the largest integer < x. Thus 
2 2 . А п\? , ny 
P{S? = t} = Р{п5„ — S = i(n — i] = P (5 – 5 -(i- 2) 


-P(S,—iorS$,—n-i] 


= ()ra — р)" + ("рта - р) 


{|р уу »у-# n-2i ; n 
= (")а pia = pr? +p i< [5] 


If n is even, п = 2m, say, where m > 0 is an integer, and i = m, then 


___т_ af) ie m 
“ле hats OO к 


In particular, if n = 7, 52 = 0, 4, 3, and 2 with probabilities (p? + (1 — р)7}, 


7р(1—р){р?+(1—р)3}, 21 p? (1— p (p? +1 — p)3), and 35p?(1— p)’, respectively. 
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If n = 6, then 52 = 0, 1, ie, and 5 with probabilities {pf + (1 — р)б}, 6p(1 — 
p){p* + (1 — р)%}, 15p?(1 — p)*{p? + (1 — р)?}, and 40p? (1 — р)?, respectively. 


We have already considered the distribution of the sample quantiles in Section 4.7 
and the distribution of range X(n) — X(1) in Example 4.7.4. It can be shown without 
much difficulty that the distribution of the sample median is given by 


EN. CEPR r-l __ n-—r ge, n +1 
ғ 1) (п — pi o [1 — FO)" fO) ifr — Sar 


Q9) №(у) = ( 


where F and f are the population DF and PDF, respectively. If n = 2m and the 
median is taken as the average of X¢m) and Xm4+1), then 


2(2m)! 


(30) f-Q) = m — DIE 


f [LF (2y — v) E — FQ"! f Оу — v) f () dv. 
› 


Example 5. Let Х\, X2,..., X, be a random sample from U (0, 1). Then the 
integrand in (30) is positive for the intersection of the regions 0 « 2y — v « 1 and 
0 < v < 1. This gives v/2 < y < (v + D/2, y < v, and < v < 1. The shaded 
area in Fig. 2 gives the limits on the integral as 


у<0 < 2у #0 <у< 1, 


апа 
y<u<l if}<y<l. 


y=vi2 


> 
d 
| | 
N 


Меле? 
Nn 


И 
0 М à m 


0 1 1 y 


Fig. 2. (y <v <2y,0 < y < andy << 1,5 <ух 1). 
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In particular, if m = 2, the PDF of the median, (X?) + X(3))/2, is given by 


8y?(3 — 4y) if0 « y « 1, 
fro) = { 8(4y? —9y? + бу — 1) if 4 <y<l, 
0 otherwise. 


In Section 7.5 we study large-sample theory techniques to approximate distribu- 
tions of sample statistics when n is large. 


PROBLEMS 7.3 


1. Let X1, X2, ... , X, be random sample from a DF F, and let F*(x) be the sam- 
ple distribution function. Find соу (СЕ (x), F} (y)) for fixed real numbers x, y. 


2. Let F7 be the empirical DF of a random sample from DF F. Show that 
pirer > el гапе» о. 
^ —2,/п | e 


3. For the data of Example 7.2.2, compute the sample distribution function. 


4. (a) Show that the sample correlation coefficient R satisfies |R| < 1 with equal- 
ity if and only if all sample points lie on a straight line. 
(b) If we write U; = aX; + b (a ж 0) and V; = сү; +d (c # 0), what is the 
sample correlation coefficient between the U’s and the V’s? 


5. (a) A sample of size 2 is taken from the PDF f(x) = 1,0 < x < 1, and = 0 
otherwise. Find P(X > 0.9). 
(b) A sample of size 2 is taken from b(1, p). Find (i) Р(Х < p), and (ii) 
P(S? > 0.5). 


6. Let X1, X2,..., X, be a random sample from N (и, о?). Compute the first four 
sample moments of X about the origin and about the mean. Also compute the 
first four sample moments of 5? about the mean. 


7. Derive the PDF of the median given in (29) and (30). 


8. Let Ua), Оо), ... , Ооп) be the order statistics of a sample size n from U (0, 1). 
Compute E Ut for any 1 < r < n and integer К (> 0). In particular, show that 


— 1 
and  var(U(5) = ORAT ED 


ЕИО 27 (п + 020 +2): 





Show also that the correlation coefficient between Ug) and Us) for 1 < r < 
s < nis given by [r(n — s + D/s(n — г + 1)]!/2. 
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9. Let X1, X2,..., Х be n independent observations on X. Find the sampling 
distribution of X, the sample mean, if (a) X ~ P(A), (b) X ~ C(1,0), and 
(c) X ^ у?(т). 


10. Let X1, X2, ... , X, bea random sample from G(a, В). Let us write Y, = (X — 
аВ)/В./а/п, п = 1, 2,.... 


(a) Compute the first four moments of Y,, and compare them with the first four 
moments of the standard normal distribution. 


(b) Compute the coefficients of skewness ез and of kurtosis од for the RVs Y,,. 
(For definitions of оз, a4, see Problem 3.2.10.) 


11. Let X1, X2,... , X, be a random sample from U[0, 1]. Also let 2, = (X — 
0.5)/ 4 1/12n. Repeat Problem 10 for the sequence Z,. 


12. Let X1, X2,.. ., X, bea random sample from P(A). Find var(S2), and compare 
it with var(X). Note that EX = А = ES?. (Hint: Use Problem 3.2.9.) 


13. Prove (24) and (25). 


14. Multiple RVs X1, X2, ... , X, are exchangeable if the n! permutations (X;, , X;,, 

., Xj,) have the same multidimensional distribution. Consider the special case 

when X's are two-dimensional. Find an analog of Theorem 6 for exchangeable 
bivariate RVs (X1, Y1), (X2, Y2), ... , (Xn, Yn). 


7.4 CHI-SQUARE, t-, AND F-DISTRIBUTIONS: EXACT 
SAMPLING DISTRIBUTIONS 


In this section we investigate certain distributions that arise in sampling from a nor- 
mal population. Let X1, X2, ... , X, be a sample from N (џи, o?). Then we know that 
X ~ Ми, o? [n). Also, {y/n (X — и)/о}? is x?(1). We determine the distribution 
of S? in the next section. Here we define mainly chi-square, t-, and F-distributions 
and study their properties. Their importance will become evident in the next section 
and later in the testing of statistical hypotheses (Chapter 10). 

The first distribution of interest is the chi-square distribution, defined in Chapter 5 
as a special case of the gamma distribution. Let n > 0 be an integer. Then G(n/2, 2) 
is a x?(n) RV. In view of Theorem 5.3.29 and Corollary 2 to Theorem 5.3.4, the 
following result holds. 


Theorem 1. Let X1, X2,... , X, be iid RVs, and let 5, = 37; ., Xx. Then 


(a) Sn ~ x?(n) & Xi ~ Х2(1), and 


(b X1 ~ МО, D > у X; ~ x7(n). 


k=l 
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If X has a chi-square distribution with n d.f., we write X ~ x? (n). We recall that 
if X ~ x? (n), its PDF is given by 


х"/?—1 e? 

— if x > 0, 
(D f(x) = 4 2"/2Г(п/2) 

0 if x < 0, 
the MGF by 
(2) M(t)= (1-21)? гг < $, 
and the mean and the variance by 
(3) ЕХ = п, and var(X) = 2n. 


The x?(n) distribution is tabulated for values of n = 1, 2, .... Tables usually go 
up to n = 30, since for n > 30 it is possible to use normal approximation. In Fig. 1 
we plot the PDF (1) for selected values of n. 

We will write Хе for the upper o percent point of ће ҳ2(п) distribution, that is, 


(4) Р{х? (п) > x24) = о. 


Table ST3 at the end of the book gives the values of x2 « for some selected values of 
n and a. 





0 10 20 зо 40 50 60 70 


Fig. 1. Chi-square densities. 


326 SAMPLE MOMENTS AND THEIR DISTRIBUTIONS 
Example 1. Let n = 25. Then, from Table ST3, 
P(x^(25) < 34.382} = 0.90. 


Let us approximate this probability using CLT. We see that Ex?(25) = 25, 
var x2(25) = 50, so that 


2025) – 25 _ 34.382 — 2. 
P{x2(25) < 34.382} = p } *-СЭ)— 25 „ 430223 
/50 5/2 
& P{Z < 1.32) 
= 0.9066. 

Definition 1. Let X1, X2,... , X, be independent normal RVs with EX; = pij 
and var(X;) = o?, i = 1,2,... , n. Also, let Y = Уту Х2/02. The RV Y is said 
to be a noncentral chi-square RV with noncentrality parameter ) 5. у и? /о? and n 
d.f. We will write Y ~ x2(n, 8), where 6 = Ул u2/o?. 


Although the PDF of a x?(n, 8) RV is hard to compute (see Problem 16), its MGF 
is easily evaluated. We have 


n 
ма) = Ee YT X? D [ eet, 
1 


where X; ~ N (uj, 07). Thus 


2 
Ee Х?/е? = E 1 ехр E = d dx;, 





-œ o V2 а? 20? 


where the integral exists for t < 1. In the integrand we complete squares, and after 
some simple algebra we obtain 











1 tu? 1 
E 1X? [o? = ——— А t т. 
ý Au 88-20 s3 
k follows that 
2 
— (1 —2)~"/2 уэ 1 
(5) M(t) = (1—2) (т 23 ) і < 2 


and the MGF of a x(n, 5) RV is therefore 





(6) M(t) =(1— 2" ер (7 +8) ; t< >. 
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It is immediate that if Y1, Y2,... , Yp are independent, Y; ^ x?(ni, ôi), i = 
1,2, ... , k, then YE 4 Y; is x20 1 a ni, Y 40. 
The mean and variance of x ^(n, 5) are easy to calculate. We have 
УЛ ЕХ}  Yalvar(X) + EX)? 
Ола? Ыл cL MT 
Qno 243 по? + үн? 
UR 


EY — 


=n+6 


and 





var(Y) = var 


zt) [er 


Р 
Р 
Р 


ЕХ? — у коф | 
= E (2no4 + 4c? Yu) = 2n + 45. 


з 


li 
al = 


[i 


з 


1 
74 


| 


Got + 607 u? + н) - У (o? + ZU 


i-i 


ni 


We next turn our attention to Student's t-statistic, which arises quite naturally in 
sampling from a normal population. 


Definition 2. Let X ~ A (0, 1) and Y ~ х?(п), and let Х and Y be independent. 
Then the statistic 

Х 
VY/n 


is said to have a t-distribution with n d.f. and we write T ~ t(n). 


(7) T= 





Theorem 2. The PDF of T defined in (7) is given by 


r[(n + 1)/2] 
Г(п/2) /nz 


The proof is left as an exercise. 


(1+ On) 0*5. —o0 < t < оо. 


(8) fn (t) = 


Remark 1. Forn = 1, T is a Cauchy RV. We will therefore assume that n > 1. 
For each n, we have a different PDF. In Fig. 2 we plot f, (t) for some selected values 
of n. Like the normal distribution, the r-distribution is important in the theory of 
statistics and hence is tabulated (Table STA). 
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Fig. 2. Student’s t-densities. 


Remark 2. The PDF f,(r) is symmetric in t, and f4,(t) — Oast — +оо. 
For large n, the t-distribution is close to the normal distribution. Indeed, (1 + 
ny "+002 _, e as n — oo. Moreover, as £ > oo ort > оо, the tails of 
fa (t) — 0 much more slowly than do the tails of the A/ (0, 1) PDF. Thus for small n 
and large fo, 


P(IT] > to} = Р{|2| > to}, Z ~ №, 1); 


that is, there is more probability in the tail of the t-distribution than in the tail of the 
standard normal. In what follows we write t 4/2 for the value (Fig. 3) of T for which 


(9) P(IT| > tra/2} = a. 


In Table ST4 positive values of în œ are tabulated for selected values of n and a. 
Negative values may be obtained from symmetry, tj, о = —tn,a- 


Example 2. Let п = 5. Then from Table ST4, we get [50.025 = 2.571 and 
15.0.05 == 2.015. The corresponding values under the M (0, 1) distribution are 20.025 = 
1.96 and 20.05 = 1.65. For n = 30, 


130,0.05 = 1.697 and 20.05 = 1.65. 


Theorem 3. Let X ~ t(n),n > 1. Then ЕХ” exists for r < n. In particular, if 
r «nis odd, 


(10) EX' =0, 
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t(n) 
а/2 0/2 
—th a2 0 th, 72 
Fig. 3. 
and ifr < n is even, 
(11) EX' = pr Pie + 1/211 [@ — 7)/2] 


Г(1/2)Г (n/2) 
Corollary. If n > 2, EX = 0 and EX? = var(X) =n/(n — 2). 
Remark 3. If in Definition 2 we take X ~ N (u, 02), Y/a? ~ x(n), and X and 
Y independent, 
X 
J/Y /n 


is said to have a noncentral t-distribution with parameter (also called noncentrality 
parameter) 5 = y/o and d.f. n. Various moments of noncentral t-distribution may 
be computed by using the fact that expectation of a product of independent.RVs is 
the product of their expectations. 





T = 


We leave the reader to show (Problem 3) that if T has a noncentral t-distribution 
with n d.f. and noncentrality parameter ô, then 


Tie — 22] [n 


(12) ET —8 | 
Г(п/2) V2 


п> 1, 
апа 


2 2 = 2 
бу. Raney: Ms em (a) "eo 


n—2 2 T(n/2) 
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Definition 3. Let X and Y be independent x? RVs with m and п d.f., respectively. 
The RV 
X/m 


(14) F= ҮР 





is said to have an F-distribution with (m, n) d.f., and we write F ~ F(m, n). 


Theorem 4. The PDF of the F-statistic defined in (14) is given by 


Tim + n)/2] (2) (^ jn 


F(n/2)F(n/2) и Nn 


= т —(m+n)/2 
(15) e) (+) | f>0, 
n 
0, f <9. 
The proof is left as an exercise. 


Remark 4. If X ~ F(m,n), then 1/X ~ F(n, m). If we take m = 1, then 
Е = [г(п)]?, so that F(1, n) and t?(n) have the same distribution. It also follows 
that if Z is C(1, 0) [which is the same as t (1)], Z? is F(1, 1). 


Remark 5. As usual, we write Finn for the upper о percent point of the 
F (m, n) distribution, that is, 


(16) P{F(m,n) > Fana} = a. 


From Remark 4, we have the following relation: 


1 


Е.т, 


(17) Fm,n, 1-a = 





It therefore suffices to tabulate values of F that are > 1. This is done in Table ST5, 
where values of Fin.n,q are listed for selected values of m, n, and œ. See Fig. 4 for a 


plot of g(f). 


Theorem 5. Let X ~ F(m, n). Then, for k > 0, integral, 





к үл ү PIE (/2)]TEG/2) — К] 
(18) ЕХ* = (—) "remp forn » 2k. 
In particular, 
(19) ЕХ = — ; п> 2, 

п – 2 
апа 
2 Е 

(20) aa OP D п> 4. 


m(n — 2)2(п — 4)' 
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0 1 2 3 4 5 6 7 8 
Fig. 4. F densities. 


Proof. We have for a positive integer k, 


оо m \—(m+n)/2 
(21) f ph fa (1 + =) DF 
0 n 
| 
= (yf germ — yye-k-t gy 
n 0 


where we have changed the variable to x = (т/п) РП + (m/n) f 171, The integral 
in the right side of (21) converges for (n/2) — k > 0 and diverges for (n/2) — k < 0. 
We have 


ext Res (my (2) "26 22-4). 


as asserted. 
For k = 1 we get 


xui dE. ‚В, 
a> TUE ETE n2. 
Also, 
2g (2Y mov 2 1) 
ЕХ = (5) [/2) — 1110/2) – 2] ш 


2 2 
(2) we 2 


332 SAMPLE MOMENTS AND THEIR DISTRIBUTIONS 


and 


уп 2 т(т+2) п 2 
v (7) (п —2у(л—4у - (5 = ;) 


2n? (m +n — 2) 
= ——————— , n> 4. 
m(n — 2)? (n — 4) 
Theorem 6. If X ~ F(m,n), then Y = 1/[1 + (т/п) Х] is B(n/2, m/2). Con- 
sequently, for each x > 0, 


1 
Ех (х) = 1— Fy irre А 


If in Definition 3 we take X to бе a noncentral x? RV with n d.f. and noncentrality 
parameter 5, we get a noncentral Е RV. 


Definition 4. Let X ~ x?(m, 5) and Y ~ x? (n), and let X and Y be independent. 
Then the RV 
X/m 


(22) = Y/n 





is said to have a noncentral F-distribution with (m, n) d.f. and noncentrality param- 
eter å. 


It is shown in Problem 2 that if F has a noncentral F-distribution with (m, n) d.f. 
and noncentrality parameter 8, 


= n(m + 8) 


ВЕ Gy n> 2, 
and 
2n? 2 
PROBLEMS 7.4 
1. Let 
—] рх 
Р, = [г (5) na] f oDe 2 do, х > 0. 
0 
Show that 
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ge 


10. 
11. 


12. 
13. 
14. 
15. 
16 


е 


Let X ~ Е(т, п, 5). Find EX and var(X). 


3. Let T be a noncentral t-statistic with п d.f. and noncentrality parameter б. Find 


ET and var(T). 
Let F ~ F(m, n). Then 


-1 
y - (re Sr) ~ B(Z =). 
п 
Deduce that for х > 0, 


Pir єх}=1-Р{Ү (14x) ‘I. 


5. Derive the PDF of an F-statistic with (m, n) d.f. 


Show that the square of a noncentral t-statistic is a noncentral F statistic. 


7. А sample of size 16 showed a variance of 5.76. Find c such that Р{|Х — ul < 


c) = 0.95, where X is the sample mean and и. is the population mean. Assume 
that the sample comes from a normal population. 


A sample from a normal population produced variance 4.0. Find the size of the 
sample if the sample mean deviates from the population mean by no more than 
2.0 with a probability of at least 0.95. 


Let X1, X2, X3, X4, Xs be a sample from M (0, 4). Find Р(УЎ x? > 5.75}. 
Let X ~ x?(61). Find P{X > 50}. 


Let F ~ F(m,n). The random variable Z = 5 log F is known as Fisher’s 
Z-statistic. Find the PDF of Z. 


Prove Theorem 1. 
Prove Theorem 2. 
Prove Theorem 3. 
Prove Theorem 4. 


(a) Let fi, f2,... be PDFs with corresponding MGFs М\, M2, ..., respec- 
tively. Let v; (0 < o; < 1) be constants such that У 779 ро; = 1. Then 


f= YT a; fj is a PDF with MGF M = a o; Mj. 
(b) Write the МСЕ of a x? (n, 5) RV in (6) as 
oo 
M(t) = Y ajMj(t) 
j=0 


where M;(t) = (1 — 2t)~@/+”/2 is the МСЕ of a x?(2j + n) RV and 
j | J 
a; = e 9? (8/2) /j! is the PMF of a P(5/2) RV. Conclude that PDF of Y ~ 
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x2(n, 5) is the weighted sum of PDFs of x2(2j + п) RVs, j = 0, 1, 2,... 
with Poisson weights and hence 


© 6—5/2(5/2)) y(2j+n)/2-1 2599 
љо) = У (8/2)? y exp(—y/2) 


£o Й RT; +A 


7.5 LARGE-SAMPLE THEORY 


In many applications of probability one needs the distribution of a statistic or some 
function of it. The methods of Section 7.3 when applicable lead to the exact distri- 
bution of the statistic under consideration. If not, it may be sufficient to approximate 
this distribution provided that the sample size is large enough. 

Let {Xn} be a sequence of RVs that converges in law to N (u, 02). Then ((X, — 
ід/а)} converges in law to N(0, 1), and conversely. We will say alternatively and 
equivalently that {Xn} is asymptotically normal with mean џи and variance c?. More 
generally, we say that X, is asymptotically normal with "mean" un and “variance” 
02, and write X, is АМ(ил, 02), if on > O and as n — oo, 

Xn — Hn 


(1) Sac E ATQOE: 
On 


Here un is not necessarily the mean of X, and оў, not necessarily its variance. In this 
case we can approximate, for sufficiently large n, P{X, < t) by P{Z < (t—un)/on} 
where Z is A (0, 1). 

The most common method to show that X, is АМ(д„, o2) is the central limit the- 
orem of Section 6.6. Thus, according to Theorem 6.6.1, /n(Xn — и) E А (0, o?) 
as n — оо, where X, is the sample mean of n iid RVs with mean и and variance 
о?. The same result applies to the kth sample moment, provided that E| X|?* < оо. 
Thus 





n 


n xk x* 
2 is AN (zv. ER 3! 


j=l 


In many large-sample approximations an application of the CLT along with Slutsky’s 
theorem suffices. 


Example 1. Let X;, X2,... be iid N (u, a?). Consider the RV 
ae Jn(X — и) 
eS ra ae ne 
$ 
The statistic 7, is well known for its applications in statistics and in Section 7.6 


we determine its exact distribution. From Example 6.3.4, (n — 1)S?/n Zo? and 
hence $/т -£, 1. Since nX — p)/o —, Z ~ N(0, 1), it follows from Slutsky's 
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theorem that Т, —L, Z. Thus for sufficiently large n (n > 30), we can approximate 
Р{Т, x t) by P{Z <t}. 
Actually, we do not need X's to be normally distributed (see Problem 6.6.5). 
Often, we need to approximate the distribution of g (Yn) given that Y, is АМ(и, с?). 


Theorem 1. Suppose that Y, is АМ(и, 02), with on > О and и a fixed real 
number. Let g be a real-valued function that is differentiable at x = и, with g'(u) 3 
0. Then 


0) 805) is AN (30), (8000202). 
Proof. We first show that 


80Ү.) – (и) Ү„—и P 
m ~ — 





3 0. 
9) 8'(и)оһ On 
Set 
gx)—-8() , 
nee Fm — g (и), хр 
0, х= џи 


Then A is continuous at x = p. Since 


BL 





Y, — и = 0 


On 


by Problem 6.2.7, Y, — ш as 0, and it follows from Theorem 6.2.4 that h(Y,) ae 
h(n) = 0. By Slutsky's theorem, therefore, 


0. 





hoa Ё 


On 
That is, 


gYX,)-g(Q) Ya—nu P 
— > 0. 
Ong’ (H) On 





It follows again by Slutsky's theorem that [g(Y4) — g()]/Lg’()on] has the same 
limit law as (Y4 — 1/04. 


Example 2. We know by the CLT theorem that Y, — X is АМ(и, o? /n). Suppose 
that g(X) = X(1 — X), where X is the sample mean in random sampling from 2 
population with mean и and variance o?. Since g/(u) = 1 — 2u £ Oforu + 1 » 


it follows that for и # 3, 0? < oo, X(1 — X) is AN(u(1 — и), (1 — 2—)202/п). 
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Thus 
Ха-Х)-и(1-ш 2 у-и и) 
l1—-2ulo///n) 7 M —2ulo/A/n 


(22595) 
|1 — 2plo/J/n 


ea -3«n-r| 


for large n. 


Remark I. Suppose that g in Theorem 1 is differentiable k times, k > 1, at 
x = p and g® (u) = Оїог1 <i < k — 1, g® (u) $ 0. Then a similar argument 
using Taylor's theorem shows that 


1 
(4) [g(Y&) — g(2] / FAC E 


where Z is a V (0, 1) RV. Thus in Example 2, when и = 1, g'G) = 0 and e" d» = 
—2 # 0. It follows that 


= = L 
nIX(1 — X) - 11— -e?x*() 
since 72 ® у?(%). 


Remark 2. Theorem 1 can be extended to the multivariate case, but we will not 
pursue the development. We refer the reader to Ferguson [26] or Serfling [100]. 


Remark 3. Їп general, the asymptotic variance [g’ (u) Po? of g(Y4) will depend 
on the parameter д. In problems of inference it will often be desirable to use trans- 
formation g such that the approximate variance var g(Y,) is free of the parameter. 
Such transformations are called variance stabilizing transformations. Let us write 
о? = о?(и)/п. Then finding a g such that var g(Y,) is free of и is equivalent to 
finding a g such that 


/ E С 
g (и) = PP 


for al! р, where c is a constant independent of и. It follows that 


dx 
(5) g(x) = cf EE 


a(x) 
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Example 3. In Example 2, c?^(u) = u(1 — p). Suppose that X,,... , Xn are iid 
b(1, p). Then o?(p) — p(1 — p) and (5) reduces to 


кб) = f ciae E. 
х= x 


Since g(0) = 0, g(1) = 1, c = 2/z and g(x) = (2/7) arcsin J/x. 


Remark 4. n Section 7.3 we computed exact moments of some statistics in 
terms of population parameters. Approximations for moments of g(X) can also be 
obtained from series expansions of g. Suppose that g is twice differentiable at x = и. 
Then 


(6) Eg(X)  g(p) + E(X — Wg (H) + 1g" Q)EQX — uy 
and 
(7) E[g(X) — gw © [g Q)P E(X — uy", 


by dropping remainder terms. The case of most interest is to approximate Eg (X) and 
var g (X). In this case, under suitable conditions, one can show that 


2 
(8) Eg(X) © g(u) + S8 Qu) 
n 
and 
— а? 
(9) var g(X) © zu oor 


where E X = и and var(X) = о?. 


In Example 2, when X;'s аге iid b(1, p), g(x) = x(1 — x), g'(x) = 1 — 2x, 
g” (x) = —2, so that 


UR = = 2 
Eg(X) ~ E[X(1 — X) ~ p(1 — р) + 22) 
— 1 
= р(ї—р)——— 
п 
апа 


var g(X) = P 


РОР) _ apy? 
n 


In this case we can compute Eg(X) and var g(X) exactly. We have 
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oe = ЕЕЕ, T= zd 
Eg = EX ЕЎ = p - (РО e e| ра — p. 


so that (8) is exact. Also, since X k = Xj, using Theorem 7.3.4 we have 
var g(X) = var(X — x) 
— var(X) — 2cov(X, x^ SEX (ЕХ)? 


E Е _1\? 
_ РА р) [o -ap 4 2Р0 »| k J | 
п п 


—1 п 


Thus the error in approximation (9) is 


2р?(1— р)? 
error = = ac D. 
n 


Remark 5. Approximations (6) through (9) do not assert the existence of Eg(X) 
or Eg(X), or var g(X) or var g(X). 


Remark б. Itis possible to extend (6) through (9) to two (or more) variables by 
using Taylor series expansion in two (or more) variables. 


Finally, we state the following result, which gives the asymptotic distribution of 
the rth order statistic, | < r < n, in sampling from a population with an absolutely 
continuous DF F with PDF f. For a proof, see Problem 4. 


Theorem 2. If X(, denotes the rth-order statistic of a sample X1, X2, ... , Xn 
from an absolutely continuous DF F with PDF f, then 


E 1/2 : 
(10) Fil Ор Хо — 3p} — Z asn — oo, 


so that r/n remains fixed, r/n = p, where Z is N (0, 1), and 5, is the unique solution 
of F(55) = p (that is, 3, is the population quantile of order p assumed unique). 


Remark 7. Тһе sample quantile of order p, Zp, is 


1 FE 
Niue PC UM 
(» FGD m 


where 3, is the corresponding population quantile and f is the PDF of the population 


distribution function. It also follows that Zp mis фр 
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PROBLEMS 7.5 


1. 


In sampling from a distribution with mean y and variance о?, find the asymp- 
totic distribution of (a) X^, (b) 1/X, (с) In|X(?, and (d) exp(X), both when 
р x: О and when и = 0. 


Let X ~ P(A). Then (X — 2)/A/À —» N (0, 1). Find a transformation g such 
that (g(X) — g(A)) has an asymptotic A (0, c) distribution for large и, where c 
is a suitable constant. 


. Let X1, X2,... , Xn be a sample from an absolutely continuous DF F with PDF 


f. Show that 





Aa і r 
SAOSA тт 


and 
r(n —r 4 1) 1 
(n + 1?(n + 2) {FIF r/n + 0]? 


[Hint: Let Y be an RV with mean p, and ф be a Borel function such that Eo (Y) 
exists. Expand $ (Y) about the point и by a Taylor series expansion, and use the 
fact that F(X ry) = U(.] 


Prove Theorem 7. [Hint: For any real и and о (> 0), compute the PDF of 
(Ug) —4)/a and show that the standardized „у, (О) — 4) /о, is asymptotically 
AN (0, 1) under the conditions of the theorem.] 


var(X (5) © 


. Let X ^ x?(n). Then (X — n)/4/2n is AN(O, 1) and X/n is AN(1, 2/n). Find 


a transformation g such that the distribution of g(X) — g(n) is AN(O, c). 


6. Suppose that X is G(1, 0). Find g such that eX) — g (0) is AN(0, c). 
7. Let X1, X2,... , Xn be iid RVs with ЕХ | < oo. Let var(X) = o? and f; = 


7.6 


pa[o*. 
(a) Using the CLT for iid RVs, show that /n($? — 02) — N(0, u4 — a^). 


(b) Find a transformation g such that g(5?) has an asymptotic distribution that 
depends on f» alone, not on с“. 


DISTRIBUTION OF (X, 52) IN SAMPLING FROM 
A NORMAL POPULATION 


Let X1, X2, ... , Xn be a sample from A (и, 0), and write X = n^! Y, X; and 
52 = (n~1)7! Ў (Xi — Xy In this section we show that X and 5° are inde- 
pendent and derive the distribution of 52. More precisely, we prove the following 
important result. 
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Theorem 1. Let X1, X2, ... , Xn be iid Л(и, о2) RVs. Then X and (X; — 


X,X2— X,..., Xn ~ X) are independent. 
Proof. Wecompute the MGF of X and X, — X, X2 — X,... , X, — X as follows: 


M(t, ty, t2, ... , tn) = Eexp(tX + (X1 — X) + (X2 — X) +--+ tn(Xn — X) 


izl 


п hi4 t9) -t 
= Eexp э Xi (s -— x 2] 


i=] 


= Е n [ee] (were? =ni у) 
i=] 


i=l 
= [] ee |=" 4 n(ti = 
i=1 n 
Пее. Slt +n 0) | 


п 2 n 
= exp [5o +n SC Е = ун + n(ti -or| 
i=l 


2 
= exp(ut) exp E (r + n? Y — 2] 


i=l 


o? , c? 
=ехр{ш + 2 exp "yu —1)? 
n i=} 


= Mx(DMy, x. x, - XC fa) tn) 
= M(t,0,0,... ,O)M(O, д, t2,... , tn). 


Corollary 1. X and S? are independent. 
Corollary 2. (n — 1)S*/o? is x?(n — 1). 


Since 
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and X and 5? are independent, it follows from 


UN. i A M 2 
iQ – пу _ х=й Phap- 
о? с о? 


that 


п (у ES E HT 2 
[e| xm s exp m (==) +a- Dost 


1 
= 2 
X – s? 
= E exp ( e) t ZU 
а o 


1 


52 
(1 20) "02 = (1 — 20) ? E exp С - Т] ,  t<s, 
с 2 








that is, 


and we see that 
у, | дуел 1 
Е ехр | (п — 1) z! = (1—2) ; г< =. 
o 2 
By the uniqueness of ће MGF it follows that (n — 1)52/o? is x?(n — 1). 
Corollary 3. The distribution of /n(X — ш)/5 is t(n — 1). 


Proof Since /n(X — и)/о is N(0, 1) and (n — 1)52/02 ~ x?(n — 1), and 
since X and S? are independent, 
Jn (X — ua yn (X — 1) 
Vin — 1952/02]/(п — 1) $ 





is t(n — 1). 


Corollary 4. If X1, X2, ... , Xm are iid V (gi, оў) RVs, Y1, Y2,... , Y, are iid 
N (u2, 02) RVs, and the two samples аге taken independently, (52/02)/(82/02) is 
F(m — 1,n — 1). If, in particular, оү = 02, then 52/52 is F(m — 1,n — 1). 


Corollary 5. Let X1, X2, ... , Xm and Y1, Y2, ... , Yn, respectively, be indepen- 
dent samples from N (uu, оў) and Л (ио, 02). Then 


X —Y — (m — m2) m+n—2 


(i= ЕБЕТ DS ү o? m Eom _ —2). 
(lm — 052/021 + [т — DSE/o2WA V adm olm tn 
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In particular, if c? = o», then 
X-Y-(u- / —2 
(ил — иә) mn(m +n mnes 
[m — 1)S? + (n — 082] mEn 


Corollary 5 follows since 


n 


3^ ud 
РЕ (o О. 
ЕЕ +2) 
m 
and 


— 1)52 -pDsg | 
шш. сы ОРНО Инн 


2 
91 95 


and the two statistics are independent. 
Remark 1. The converse of Corollary 1 also holds (see Theorem 5.3.28). 


Remark 2. In sampling from a symmetric distribution, X and 52 are uncorre- 
lated (see Problem 4.5.14). 


Remark 3. Alternatively, Corollary 1 could have been derived from Corollary 2 
to Theorem 5.4.6 by using the Helmert orthogonal matrix: 


1/ An Мт 1/ n gea п 

~1//2 1/2 0 ш: 0 

1/46 —1/¥6 2/V6 es 0 
А = А è Я у 


0 


| . . nee 0 
—l/4n(n—1) —1//л(л—1) —l/4/n(n—1) --- (n—D/4n(n—) 


For the case of n = 3 this was done in Example 4.4.6. In Problem 7 the reader is 
asked to work out the details in the general case. 


Remark 4. Ап analytic approach to the development of the distribution of X and 
S? is as follows. Assuming without loss of generality that X; is Л/(0, 1), we have as 
the joint PDF of (X1, X2, ... , Xn) 


1 (fev, 
/(х1, X2,--- Xn) = Gay ХФ (-: P4) 


NS NM (n — Ds? + nx? 
= Qa хр 2 : 
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Changing the variables to y1, y2,..- , y, by using the transformation yy = (xy — 
x)/s, we see that 


S yd and Y 4 -a-1 
k=1 k=l 


It follows that two of the y;’s, say Yn-1 and y, are functions of the remaining ук. 
Thus either 














о + В а – В 
Yn-1 = апі уп = 3 `’ 
or 
а – В а + В 
Уп-1 = and уһ = , 
2 
where 





п—2 п—2 п—2 г 
а= Уу and B= 20-2598 ( n) 
1 


k=1 k=l k= 


We leave the reader to derive the joint PDF of (¥1, Y2,..., Yn—2; X, 82), using 
the result described in Remark 4.4.2 and to show that the RVs X, 52 and (Y3, Yo, 
... › Yn—2) are independent. 


PROBLEMS 7.6 


1. Let X1, X2,... , X, bea random sample from Л (и, a?) and X and 52, respec- 
tively, be the sample mean and the sample variance. Let Xn41 ~ Л (и, c?), and 
assume that X1, X2, ... , Xn, X441 are independent. Find the sampling distri- 
bution of [(Xn41 — X)/5] /n/(n + D). 


2. Let X1, X2, ... , Xm and Yj, Y2, ... , Y, be independent random samples from 
N (1,07) and N (u2, 07), respectively. Also, let а, В be two fixed real num- 
bers. If X, Y denote the corresponding sample means, what is the sampling dis- 
tribution of 


e(X — шщ) + BY — ш) 


(m — 1)52 + (n — 1052 а? В? 
m+n—2 m п’ 


where 52 апа 55, respectively, denote the sample variances of the X’s and the 
Y’s? 
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7.7 
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Let X1, X2,... , X, be a random sample from N (и, o?) and k be a positive 


integer. Find E(S?*). In particular, find E(S?) and уаг(52). 


. A random sample of 5 is taken from a normal population with mean 2.5 and 


variance 02 = 36. 
(a) Find the probability that the sample variance lies between 30 and 44. 


(b) Find the probability that the sample mean lies between 1:3 and 3.5, while 
the sample variance lies between 30 and 44. 


. The mean life of a sample of 10 light bulbs was observed to be 1327 hours with 


a standard deviation of 425 hours. A second sample of 6 bulbs chosen from a 
different batch showed a mean life of 1215 hours with a standard deviation of 
375 hours. If the means of the two batches are assumed to be same, how probable 
is the observed difference between the two sample means? 


Let St and $2 be the sample variances from two independent samples of sizes 
пу = 5 and n2 = 4 from two populations having the same unknown variance 
c?. Find (approximately) the probability that 5/85 < 1/5.2 or > 6.25. 


Let X1, X2, ... , X, be а sample from N (и, o?). By using the Helmert orthog- 
onal transformation defined in Remark 3, show that X and S? are independent. 


Derive the joint PDF of X and S? by using the transformation described in Re- 
mark 4. 


SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 


Let (X1, Y1), (X2, Y2), ... , (Xn, Yn) be a sample from a bivariate normal population 
with parameters u1, 42, p, of, оў. Let us write 


and 


n 


XN uw Xs Tar S i 
i=] 


i=] 


б=т - 071) A-E,  sS-m-y'»'qm -Yy, 
i=] 


i=l 


5и = (0-1)! \(Х; - XY; - Ӯ). 


i=] 


In this section we show that (X, Y) is independent of (S2, Sy, 52) and obtain the 
distribution of the sample correlation coefficient and regression coefficients (at least 
in the special case where p = 0). 
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Theorem 1. The pos vectors (X, Y) and (X; — X, Xo X Xn — X, 


Yi - Y,Y-Y,... — Y) are чере The : Jom ат of (X, Y) is 
bivariate normal with жн ил, H2, p. Oj 2 /n, 05 2 n. 


Proof. The x follows «n the m of the ge of Theorem 7.6.1. The 
МСЕ of (X, Y EX M. —X,Yi— —Y)is given by 


M* = М(и, 0,1,0, ... xn, 81,52, ..- » 52) 


i=l i=l 


"ies Yn ($ ка) н ка-®)], 


= Eexp |a +v + Уа - X)4 Y sc. ga "| 


where? = n^! У F, $ = n^ У si. Therefore, 
n 
м =]]кеяр[( +4 -i) Xi (= +5:—3) ү] 
п 
= Пее ~F) m +(> +s з) м 


4 ole /n) + — TÊ + 2poyo2[(u/n) + ti — tl[(v/n) + s; — 3) 
2 


+ of[(v/n) + з — 5} | 
2 


и 202 + 2po102uv + v oF 
2n 


= exp С + uv + 


2 
“exp E» dt — 0)? + poor Y« — Di — 5) +5 ЭЭС -37| 


і=1 i=l 


= Mı (u, v)M(ti, 12, SECUS 15,51, 52, кә Sn) 


for all real и, v, t1, t2, ... э1п› 51,52, ... > Sn, where M, is the MGF of (X, Y) and 
М» is ће MGF of (Ҳу — X,..., Xn — X, Yi — Y,... , Y, — Y). Also, Mı is the 
MGEF of a bivariate normal distribution. This completes the proof. 


Corollary. The sample mean vector (X, Y) is independent of the sample variance- 


2 
: . fs 
covariance matrix ( 1 


Su). : а : 
si у) in sampling from а bivariate normal population. 
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Remark 1. The result of Theorem 1 can be generalized to the case of sampling 
from a k-variate normal population. We do not propose to do so here. 


Remark 2. Unfortunately, the method of proof of Theorem 1 does not lead to the 
distribution of the variance-covariance matrix. The distribution of (X, Y , 5: 2 51, 52) 
was found by Fisher [27] and Romanovsky [90]. The general case is due to 
Wishart [118], who determined the distribution of the sample variance-covariance 
matrix in sampling from a k-dimensional normal distribution. The distribution is 
named after him. 


We will next compute the distribution of the sample correlation coefficient: 


10: — X)(¥i — Y) _ Su 


а) к= ышы ы mn 
[ 2 (Xi EN Хх)? у i — yy]? 51 52 


It is convenient to introduce ће sample regression coefficient of У on X 


(2) Bry = PO „5н „ке 


Since we will need only the distribution of R and By|y whenever о = 0, we make 
this simplifying assumption in what follows. The general case is computationally 
quite complicated. We refer the reader to Cramér [16] for details. 

We note that 


3) Xia ri - X) 
© (n— D$,$5 
and 
rat Yi(Xi — X) 
4 Byg = =, 
(4) Y|X Dd 52 
Moreover, 
В? „52 
2. Y|X*'1 
2 


In the following we write В = By x. 


Theorem 2. Let (X1, Yi), ... , (Xn, Yn), n > 2, be a sample from a bivariate 
pos population with parameters EX = ші, EY = pm, var(X) = о?, var(Y) = 
02, and cov(X, Y) = 0. In other words, let X1, X2, ... , Xn be iid AN (qu, оў) RVs, 
and Y1, Y2, ... , Yn be iid Л(и2, o: 2) RVs, and suppose that the X's and Y's are 
independent. Then the PDF of R is given by 
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Luc uk — г?у@т-®/2, -l<r<l, 
(6) Air) = {ГОГ — 2)/2] 
0, otherwise; 
and the PDF of B is given by 
Г(п/2) оло?! 
(7) (Б) = —oo < b < oo. 


rri — 0/21 (02 + o2b2?" 


Proof. Without any loss of generality, we assume that zy = u2 = 0 and o? = 


оў = 1, for we can always define 


(8) ХЕ ae PL - сайр. fee a a 


01 i 02 


Now note that the conditional distribution of Y;, given X1, X2, ... , Xn, is N (0, 1), 
and Y1, Y2, ... , Yn, given X1, X2, ... , Xn, are mutually independent. Let us define 
the following orthogonal transformation: 


n 
(9) u=} Gap Td 


where ((ci;))i, j=1,2,... „п 18 an orthogonal matrix with the first two rows 


1 H 
(10) CENT j=1,2,...,n, 
and 
xj—X 
(11) соу = -————————үр. j=1,2,...,n. 
[yn Gi -xy] / 
It follows from orthogonality that for any i > 2, 
n п 1 п 
(12) 2 cu Vn eye = vn у сус =0 
j=l j=l Е j=l 


and 


n n 
(13) и? = Бои 2 Cif’ у)! 
303 СС! ) ујур = 5 
1 
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Moreover, 
(14) иі = ny 


and 


(15) из = by) (xi – X), 


where b is a value assumed by RV B. Also, U1, U2, ... , Un, given X1, X2,..., Xn, 
are normal RVs (being linear combinations of the Ys). Thus 


n 


(16) ЕШ | X1, X2, ... , Xa} = Do cis ELY; | Xi, Хә, ..., Ха) 
j=l 
= 0 
апа 


п п 
cov{U;, Ux | Xi, X2, ... , Xn} = cov [Eor Уге тх а 
j=l р=1 


п п 
= У у cijcep coviY;, Yp | Xi, X2,-.. Хп} 
j=l p=1 


n 
= у CijCkj. 


je 


This last equality follows since 


0, j $ 
eoi f ль... Xl = | Aas 

1, = р. 
From orthogonality, we have 


0, i z К, 


(17) eo Ue Xi Ka = |) Nae 
, = К; 


and it follows that the RVs U1, U5, ... , Un, given X4, X2,... , Xn, are mutually 
independent A/(0, 1). Now 


n п 
(18) У`0; 72 = Voy пу? 
ј=1 і=1 
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n 
E 2 2 
= y» Ep 
j=l 
n 
EE 2 
= 25 uj 
j=2 


Thus 
U2 и? 
(19) RS ОЕК X 

MUR Uy +030: 
Writing U = и? and W = 314 UP, we see that the conditional distribution of U, 
given X1, X2, ... , Xn, is x72), and that of W, given X1, X2, ... , Xn, is x? (n —2). 
Moreover, U and W are independent. Since these conditional distributions do not 


involve the X's, we see that U and W are unconditionally independent with x0) 
and x?(n — 2) distributions, respectively. The joint PDF of U and W is 


1 
1/2-1,—u/2 (n—2)/2-1 ,—w/2 
u e Fi@ - 2/2p0-97 кру DA ш е С 





жшк 


Let u +w = z; then u = r?z and w = z(1 —7?). The Jacobian of this transformation 
is z, so that the joint PDF of К? and Z is given by 


1 
*(2, 2) = n/2-3/24~z/2(,2)-1/2(4 — „2ул/2—2. 
Р) гга урди 02 (^) ) 


The marginal PDF of R? is easily computed as 


PIG — 1/2] = - 
20 #2 a 2 1/2 1 -n 2 n/2 2, 0 2 1. 
ON: MO тет ee ay? 


Finally, using Theorem 2.5.4, we get the PDF of R as 


Mn — 1)/2] 


——————(-75y?2 reL 
rrt — 2)/2] 


fi) = 


As for the distribution of B, note that the conditional PDF of U? = уп — 1 BS), 
given X1, X2, ... , Xn, is N (0, 1), so that the conditional PDF of B, given X1, X2, 
<-s Xn is N(0, 1/ уо — X)2). Let us write A = (n — 1)S?. Then the PDF of RV 
A is that of a x2(n — 1) RV. Thus the joint PDF of B and A is given by 


(21) h(b, à) = g(b | A)h2(A), 


where g(b | A) is (0, 1/1), and h2(A) is x?(n — 1). We have 
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(22) hi(b) = [ v0. A) dd 
0 


= 1 Г ул/2-1„—-М2@+8?у 43 
2n?r rf — 1)/2] Jo 
Г(п/2) 1 


= тг DA 99 < <o. 
rE — 1)/2] (1+ Б2)"/2 oo <b < оо 


To complete the proof let us write 
Xi = ш + Xřo and Y; = рә Y;'o», 


where X? ~ Л/(0, 1) апа ¥* ~ (0, 1). Then X; ~ Лит, 07), Y; ~ N(u2, 02), 
and 


л 100 — XX; - Y) 
в (Xi — X7 aO; - Y 


(23) R= 





so that the PDF of R is the same as derived above. Also, 





n уух _ y* 
(24) pS Disi ыы А. 
of Pina (XXY 


-2p 

01 
where the PDF of B* is given by (22). Relations (23) and (24) are used to find the 
PDF of B. We leave the reader to carry out these simple details. 


Remark 3. In view of (23), namely the invariance of R under translation and 
(positive) scale changes, we note that for fixed n the sampling distribution of R, 
under о = 0, does not depend on ші, 422,01, and oo. In the general case when 
p # 0, one can show that for fixed n the distribution of А depends only on p but not 
On у, L2, 01, and o» (see, for example, Cramér [16, p. 398]). 


Remark 4. Let us change the variable to 
(25) T = ———Xn-2. 


Then 
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2 TY! 
1—R*=f1 к 
pum 


and the PDF of T is given by 





1 1 1 


(26) РЧ) = Va 5 Bin -2)2, H U 270 F (n —2)0-072* 


which is the PDF of a t-statistic with n — 2 d.f. Thus T defined by (25) has a t (n — 2) 
distribution, provided that p = 0. This result facilitates the computation of probabil- 
ities under the PDF of R when p = 0. 


Remark 5. То compute the PDF of Вуүү = R(51/5»), the sample regression 
coefficient of X on Y , all we need to do is to interchange o; and o» in (7). 


Remark 6. From (7) we can compute the mean and variance of B. For n > 2, 
clearly, 


EB —0, 
and for n > 3, we can show that 


2 
о. 1 
ЕВ? = var(B) = GL —. 
a n—3 
Similarly, we can use (6) to compute the mean and variance of R. We have, forn > 4, 
under p = 0, 


ER=0 
and 
2 1 
ЕВ = var(R) = ——. 
п—1 
PROBLEMS 7.7 
1. Let (X1, Y1), (X2, Y2),... , (Xn, Yn) be a random sample from a bivariate nor- 


mal population with EX = ш, EY = pm, var(X) = var(Y) = c?, and 
cov(X, Y) = po?. Let X, Y denote the corresponding sample means, 52, 52, 
the corresponding sample variances, and $41, the sample covariance. Write R = 
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2811 /(S? + S2). Show that the PDF of R is given by 


Г(п/2) 
AJnT[( — 1)/2] 


|r| < 1. 


/@) = (1 = pA EDA — pry OD — py. 


[Hint: Let = (X + Y)/2, and V = (X — Y)/2, and observe that the ran- 
dom vector (U, V) is also bivariate normal. In fact, U and V are independent.] 
(Rastogi [87]) 


2. Let X and Y be independent normal RVs. A sample of n — 11 observations on 
(X, Y) produces sample correlation coefficient r — 0.40. Find the probability of 
obtaining a value of R that exceeds the observed value. 


3. Let X1, X» be jointly normally distributed with zero means, unit variances, and 
correlation coefficient p. Let S be a x?(n) RV that is independent of (X, X2). 
Then the joint distribution of Y} = X1/4/$/n and Y2 = X2/./S/n is known аз а 
central bivariate t-distribution. Find the joint PDF of (Y1, Y2) and the marginal 
PDFs of Y; and Y2, respectively. 


4. Let (X1, Y), ... , (Xn, Yn) be a sample from a bivariate normal distribution 
with parameters EX; = ш, EY; = pmo, var(Xi) = var(Yi) = o?, and 
cov(X;, ¥;) = po?, i = 1,2,... , n. Find the distribution of the statistic 


(X — ш) - Č - ш) 


VX О —Y;~ XY? 


T(X, Y) = vn 


CHAPTER 8 


Parametric Point Estimation 


8.1 INTRODUCTION 


In this chapter we study the theory of point estimation. Suppose, for example, that a 
random variable X is known to have a normal distribution N (u, 2), but we do not 
know one of the parameters, say и. Suppose further that a sample X1, X2,... , Xn is 
taken on X. The problem of point estimation is to pick a (one-dimensional) statistic 
T (Xi, Xo, ... , Xn) that best estimates the parameter и. The numerical value of T 
when the realization is xj, x2, ... , Xn is frequently called an estimate of jz, while the 
statistic T is called an estimator of n. If both и and c? are unknown, we seek a joint 
statistic T = (U, V) as an estimator of (и, o?). 

In Section 8.2 we formally describe the problem of parametric point estimation. 
Since the class of all estimators in most problems is too large, it is not possible to find 
the "best" estimator in this class. One narrows the search somewhat by requiring that 
the estimators have some specified desirable properties. We describe some of these 
and also outline some criteria for comparing estimators. 

Section 8.3 deals, in detail, with some important properties of statistics, such as 
sufficiency, completeness, and ancillarity. We use these properties in later sections to 
facilitate our search for optimal estimators. Sufficiency, completeness, and ancillarity 
also have applications in other branches of statistical inference, such as testing of 
hypotheses and nonparametric theory. 

In Section 8.4 we investigate the criterion of unbiased estimation and study meth- 
ods for obtaining optimal estimators in the class of unbiased estimators. In Section 
8.5 we derive two lower bounds for variance of an unbiased estimator. These bounds 
can sometimes help in obtaining the "best" unbiased estimator. 

In Section 8.6 we describe one of the oldest methods of estimation, and in Section 
8.7 we study the method of maximum likelihood estimation and its large-sample 
properties. Section 8.8 is devoted to Bayes and minimax estimation, and Section 8.9 
deals with equivariant estimation. 
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8.2 PROBLEM OF POINT ESTIMATION 


Let X be an RV defined on a probability space (©, S, P). Suppose that the DF F of X 
depends on a certain number of parameters, and suppose further that the functional 
form of F is known except perhaps for a finite number of these parameters. Let 
Ө = (01,05, ..., 0k) be the unknown parameter associated with F. 


Definition 1. The set of all admissible values of the parameters of a DF F is 
called the parameter space. 


Let X = (X1, X2, ... , Xn) bean RV with DF Fo, where 0 = (01, 02, ... , Ok) is 
a vector of unknown parameters, Ө c ©. Let у be a real-valued function on ©. In 
this chapter we investigate the problem of approximating y(@) on the basis of the 
observed value x of X. 


Definition 2. Let X = (Xi, X2, ... , Xn) ~ Pg, Ө € Ө. A statistic 6(X) is said 
to be a (point) estimator of ү if : X — © where X is the space of values of X. 


The problem of point estimation is to find an estimator 6 for the unknown para- 
metric function (0) that has some nice properties. The value 5(x) of 5(X) for the 
data х is called the estimate of y (0). 

In most problems X1, X2, ... , X, are iid RVs with common DF Fg. 


Example 1. Let X1, X2,... , X, be iid G(1, 0), where Ө = {0 > 0} and 0 is to 
be estimated. Then X = К, and any map  : X — (0, oo) is an estimator of Ө. Some 
typical estimators of 0 are X — n! Уе X j, and (2/[n( + 1)]} 2534 ji Xj. 


Example 2. Let X1, X2, ... , Xn be iid b(1, p) RVs where p є [0, 1]. Then X is 
an estimator of p and so also аге ô; (X) = X4, 62(X) = (Ху + X,)/2, and 63(X) = 
$2 ajXj, where 0 < ај < 1, Уа aj = 1. 


It is clear that in any given problem of estimation we may have a large, often 
an infinite class of appropriate estimators to choose from. Clearly, we would like 
the estimator 5 to be close to w(@), and since ô is a statistic, the usual measure of 
closeness |6(X) — y (0)| is also an RV, we interpret “ô close to у” to mean “close on 
the average.” Examples of such measures of closeness are 


(1) Po{|5(X) — Y (O)| < =) 
for some є > 0, and 
(2) E@\5(X) — (0) 


for some r > 0. Obviously, we want (1) to be large but (2) to be small. For r = 2, 
the quantity defined in (2) is called mean square error and we denote it by 


(3) MSE9(5) = Ee(8(X) — v (Ө)}?. 
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Among all estimators for үг, we would like to choose one, say до, such that 

(4) Рө{\боСХ) — Y (0)| < =} > Рө{|8(Х) — v(0)| < e} 

for all 5, all € > 0, and all Ө. For (2), the requirement is to choose бо such that 

(5) MSEe(&o) < М5Еө(8) 


for all 5 and all Ө є ©. Estimators satisfying (4) or (5) do not generally exist. 
We note that 


MSE9(5) = Eg[8(X) — Еөё(Х)]> + [Eg8(X) — (0)? 


(6) = vare 8(Х) + (b(8. ү)}?, 
where 
(7) b(8, Y) = Еөё(Х) — (Ө), 


is called the bias of 5. An estimator that has small MSE has small bias and variance. 
To control MSE, we need to control both variance and bias. 
One approach is to restrict attention to estimators which have zero bias, that is, 


(8) E(X) = v(0) forall8 « Ө. 


The condition of unbiasedness (8) ensures that on average, the estimator ó has no 
systematic error; it neither over- nor underestimates ү on average. If we restrict at- 
tention to the class of unbiased estimators, we need to find an estimator ду in this 
class such that до has the least variance for all Ө € Ө. The theory of unbiased esti- 
mation is developed in Section 8.4. 

Another approach is to replace |5 — Vj" in (2) by a more general function. Let 
L(0, 5) measure the loss in estimating V by ô. Assume that L, the loss function, 
satisfies L(0, 8) > 0 for all Ө and $, and L(0, //(0)) = 0 for all Ө. Measure average 
loss by the risk function 


(9) К(Ө, 5) = Eg L(0,5(X)). 


Instead of seeking an estimator that minimizes R, the risk, uniformly in 0, we mini- 
mize 


(10) / R(0,5)x(0) d@ 


for some weight function л on © and minimize 


(11) sup R(6, 8). 
660 
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The estimator that minimizes the average risk defined in (10) leads to the Bayes es- 
timator, and the estimator that minimizes (11) leads to the minimax estimator. Bayes 
and minimax estimation are discussed in Section 8.8. 

Sometimes there are symmetries in the problem which may be used to restrict 
attention to estimators that exhibit the same symmetry. Consider, for example, an 
experiment in which the length of life of a light bulb is measured. Then an estimator 
obtained from the measurements expressed in hours and minutes must agree with 
an estimator obtained from the measurements expressed in minutes. If X represents 
measurements in original units (hours) and Y represents corresponding measure- 
ments in transformed units (minutes), Y = cX (here с = 60). If 5(X) is an estimator 
of the true mean, we would expect 5(Y), the estimator of the true mean, to corre- 
spond to 6(X) according to the relation 6(Y) = có(X). That is, 6(cX) = có(X) for 
all c > 0. This is an example of an equivariant estimator, a topic under extensive 
discussion in Section 8.9. 

Finally, we consider some large-sample properties of estimators. As the sample 
size n — oo, the data x are practically the whole population, and we should expect 
5(X) to approach ¥(@) in some sense. For example, if 6(X) = X, vw) = EX, 
and X1, X2, ... , X, are iid RVs with finite mean, the strong law of large numbers 
tells us that X — Eg X; with probability 1. This property of a sequence of estimators 
is called consistency. 


Definition 3. Let X1, X2,... be a sequence of iid RVs with common DF Fg, 
0 є ©. A sequence of point estimators 7,(X1, X2,..., Xn) = T, will be called 
consistent for Vr (0) if 


T, 5 (Ө)  asn— oo 
for each fixed 0 є Ө. 


Remark 1. Recall that Т, EA v (0) if and only if P(IT, — V(0)| > £} > Oas 
п — оо for every £ > 0. One can similarly define strong consistency of a sequence 


of estimators T, if Т, = V (8). Sometimes, one speaks of consistency in the rth 
r Р. А . 
mean when Т, — y (0). In what follows, consistency will mean weak consistency 


of T, for y (0), that is, Т, — ү (Ө). 


Itis important to remember that consistency is a large-sample property. Moreover, 
we speak of consistency of a sequence of estimators rather than one point estimator. 


Example 3. Let X1, X2,... be iid b(1, p) RVs. Then EX; = p and it follows 
by the WLLN that 


УЛ Х; 


п 





Р 
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Thus X is consistent for p. Also, (У X; + D/(n + 2) i p. so that a consistent 
estimator need not be unique. Indeed, if 7, 5 р and c, — Oasn — оо, then 


P 
Tn + Cn > p. 


Theorem 1. If X;, X2... are iid RVs with common law С(Х), and E|X|? < oo 

for some positive integer p, then 

n yk 

УЛАТ P gyt foe ck <p, 

n 
and n7! St X* is consistent for ЕХ“, 1 < k < p. Moreover, if c, is any sequence 
of constants such that c, — 0 as n — oo, then (n^! У XI + cn) is also consistent 
for EX*, 1 < k < p. Also, if c, — 1 asn — oo, then (can! Y xh is consistent 
for E X*. This is simply a restatement of the WLLN for iid RVs. 


Example 4. Let X1, X2, ... be iid A (p, е?) RVs. If $? is the sample variance, 
we know that (n — 1)52/02 ~ x?(n — 1). Thus Е(52/02) = 1 and var(S?/o?) = 
2/(n — 1). It follows that 


var(S*) ^ 2o* 


2 2 
PIS —oa*|>e}< 22 VO DA” 


0 as п — oo. 





Thus 52 -5 o?. Actually, this result holds for any sequence of iid RVs with E|X|? < 
oo and can be obtained from Theorem 1. 


Example 4 is a particular case of the following theorem. 


Theorem 2. If Т, is a sequence of estimators such that ET, — w(@) and 
уаг(Т„) — O as n — oo, then Т, is consistent for y (0). 


Proof. We have 
Р{1Т„ — (0) > €} < є-?Е[Т„ — ET, + ET, — VOY? 
= e7? (var(T;) ТЕТ, — (00?) > 0 — asn оо. 


Other large-sample properties of estimators are asymptotic unbiasedness, asymp- 
totic normality, and asymptotic efficiency. А sequence of estimators {Т„} is asymp- 
totically unbiased for y (0) if 


m EoTn(X) = ү(0) 


for all Ө. A consistent sequence of estimators {Т„} is said to be consistent asymp- 
totically normal (CAN) for (0) if T, ~ AN(/(0),v(0)/n) for all 0 є Ө. If 
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v(0) = 1/1(0), where 1(0) is the Fisher information (Section 8.7), then {Tn} is 
known as a best asymptotically normal (BAN) estimator. 


Example 5. Let X1, X2, ... , Xn be tid N(@, 1) RVs. Then Tp = У? у Xi/(n + 
1) is asymptotically unbiased for 0 and BAN estimator for Ө with v(0) = 1. 


In Section 8.7 we consider large-sample properties of maximum likelihood esti- 
mators, and in Section 8.5 asymptotic efficiency is introduced. 


PROBLEMS 8.2 


1. Suppose that 7, is a sequence of estimators for parameter @ that satisfies the 


conditions of Theorem 2. Then Tn 2 0, that is, Т„ is squared-error consistent 
for 0. If T, is consistent for 0 and |7, — 0| < A < oo for all 0 and all (x1, x2, 


-< , Xn) € Ry, show that Т, ee 0. If, however, |7, — 0| < A, < оо, show that 
Т, may not be squared-error consistent for 8. 


2. Let Ху, X2,..., Xn be a sample from U[0,0], 0 є Ө = (0, оо). Let X) = 
max{X), X2, ... , Xn}. Show that X(n) 5 0. Write Y, = 2X. Is Y, consistent 


for 0? 
3. Let Xj, X2,... , X, be iid RVs with EX; = и and E|X;? « oo. Show that 
T (X1, X2, ... , Xn) = 2[n(n + D]! УР i Xj is a consistent estimator for и. 


4. Let X1, X2,... , Xn be a sample from U(0, Ө]. Show that T (X1, X5, ... , Xn) = 
(ТЇ Xi)" is a consistent estimator for 8e . 


5. In Problem 2, show that T (X) = X(n) is asymptotically biased for Ө and is not 
BAN. [Show that n(9 — Xm) 5 G(1,0).] 


6. In Problem 5, consider the class of estimators T (X) = cX(4), c > 0. Show that 
the estimator To (Ж) = (n + 2) X q)/(n + 1) in this class has the least MSE. 


7. Let X1, X2, ... , X, be iid with PDF fa(x) = exp(—(x ~ 0)}, x > Ө. Consider 
the class of estimators T (X) = X(1) + b, b € R. Show that the estimator that 
has the smallest MSE in this class is given by T (X) = Xa) — 1/n. 


8.3 SUFFICIENCY, COMPLETENESS, AND ANCILLARITY 


After the completion of any experiment, the job of a statistician is to interpret the 
data she has collected and to draw some statistically valid conclusions about the 
population under investigation. In adddition to being costly to store, the raw data by 
themselves are not suitable for this purpose. Therefore, the statistician would like to 
condense the data by computing some statistics from them and to base her analysis 
on these statistics, provided that there is "no loss of information" in doing so. In 
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many problems of statistical inference a function of the observations contains as 
much information about the unknown parameter as do all the observed values. The 
following example illustrates this point. 


Example 1. Let Xj, X2,... , X, be a sample from Л (и, 1), where p is un- 
known. Suppose that we transform variables X;, Xo, ... , Xn to Y1, Y2, ... , Yn with 
the help of an orthogonal transformation so that Y, is A (n и, 1), Yo,... , Yn аге 
iid A/(0, D, and Yi, Y2, ... , Y, are independent. (Take y, = „y/n, and fork = 
2,... „п, ук = [(k — хк — (x1 t o + хк—1)]//Е(Ё — 1).) To estimate u we can 
use either the observed values of X1, X2, ... , Xn or simply the observed value of 
ү = Jn X. The RVs У, Y,... , Y, provide no information about jz. Clearly, Yı 
is preferable since one need not keep a record of all the observations; it suffices to 
accumulate the observations and compute уу. Any analysis of the data based on у 
is just as effective as any analysis that could be based on x;'s. We note that Y; takes 
values in R4, whereas (X1, X2, ... , X4) takes values in Ry. 


A rigorous definition of the concept involved in the discussion above requires the 
notion of a conditional distribution and is beyond the scope of this book. In view of 
the discussion of conditional probability distributions in Section 4.2, the following 
definition will suffice for our purposes. 


Definition 1. Let X = (Ху, X5,... , Xn) be a sample from (F9: 0 є ©}. A 
statistic Т = T (X) is sufficient for Ө or for the family of distributions (Fg: 0 є Ө} 
if and only if the conditional distribution of X, given T = t, does not depend on 0 
(except perhaps for a null set A, Ро{Т є A} = 0 for all 0). 


Remark 1. The outcome X1, X5,... , X, is always sufficient, but we will ex- 
clude this trivial statistic from consideration. According to Definition 1, if T is suffi- 
cient for Ө, we need only concentrate on T since it exhausts all the information that 
the sample has about Ө. In practice, there will be several sufficient statistics for a 
family of distributions, and the question arises as to which of these should be used in 
a given problem. We will return to this topic in more detail later in this section. 


Example 2. We show that the statistic Y; in Example 1 is sufficient for u. By 
construction ¥2,... , Y, are iid Л/(0, 1) RVs that are independent of Y;. Hence the 
conditional distribution of Y2,... , Үз, given Yi = Jn X, is the same as the un- 
conditional distribution of (Y2,..., Yn), which is multivariate normal with mean 
(0, 0, ... , 0) and dispersion matrix I,,_. Since this distribution is independent of и, 
the conditional distribution of (Yi, Y2,... , Yn), and hence (Xj, X2,... , Xn), given 
Yı = yi, is also independent of џи and Y; is sufficient. 


Example 3. Let Xi, X2, ... , X, be iid b(1, p) RVs. Intuitively, if a loaded coin 
is tossed with probability p of heads л times, it seems unnecessary to know which 
toss resulted in a head. To estimate p, it should be sufficient to know the number of 
heads in n trials. We show that this is consistent with our definition. Let T (X1, X2, 
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Xn) = Ya Xi. Then 


Pima Kem 





n PÍXi1-—x,..., == = 
у= = (Xi а Xn = Xn, Т а, 
а. (ra — prt 


if $71 x; = t, and = 0 otherwise. Thus, for ) У х; = t, we have 


Ding py-Xx 
Pits йрн = i Uu Т ш А 


BY t — pynt п\' 
()r'a p) (^) 


which is independent of p. It is therefore sufficient to concentrate on У Xi. 
Example 4. Let X1, X? be iid P(A) RVs. Then X, + X2 is sufficient for A, for 


P{X1 = x1, X2 = x2| Xi X2 = t] 
P{Xi = x1, X2 =t- x} 

= P{X, + X2 = t) 
0 otherwise. 


ifr = xi + x2, xi =0,1,2,..., 


Thus, for x; = 0, 1,2,..., i = 1,2, xj + x2 = t, we have 
t 1\? 
Р{Хү = ху, X2 = х2 | Xy c Х2 = t] = =]. 
X1 2 


which is independent of А. 
Not every statistic is sufficient. 


Example 5. Let X, X? be iid P(A) RVs, and consider the statistic Т = X1-4-2X». 

We have 

P(X1-0, X2 = 1} 

P{X, + 2X2 = 2} 
E e ^ (Xe) 
~ P(X; 20, Хо = 1}+ P(X1 = 2, Х = 0} 
_ Ae 2% _ 1 
(Xe? + (A2/2)e-2* 14 (4/2) 


P{X, =0, X2 = 1| X1 +2X2 = 2) 


and we see that X4 + 2X? is not sufficient for A. 


Definition 1 is not a constructive definition since it requires that we first guess a 
statistic T and then check to see whether Т is sufficient. Moreover, the procedure for 
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checking that T is sufficient is quite time consuming. We now give a criterion for 
determining sufficient statistics. 


Theorem 1 (Factorization Criterion). Let Xi, X2,... , Xn be discrete RVs 
with PMF po(x1, x2, ... , x5), 0 € ©. Then T(Xi, X2, ... , Xn) is sufficient for 0 if 
and only if we can write 


(1) рө (х1, Х2,... , Xn) = A(x, x2... xn)89 (T (X1, X2,-.- , Xn)), 


where h is a nonnegative function of x1, x2, ... , x, only and does not depend on Ө, 
and gg is a nonnegative nonconstant function of 0 and T (x1, x2, ... , Xn) only. The 
statistic T(X,,... , Xn) and parameter Ө may be multidimensional. 


Proof. Let T be sufficient for 6. Then P{X = x | T = t} is independent of 6, 
and we may write 


Po{X = x} = Po{X = x, T(Xi, X2,..., Xn) = t] 
= РТ =t} PIX =x|T =t}, 
provided that P{X = x | T = t] is well defined. 


For values of x for which {Ж = x} = 0 for all 6, let us define h(xi, х2, 
. , Xn) = 0, and for x for which Pa {X = x} > 0 for some Ө, we define 


h(x, x2,... Xp) = P(X1— ху... Xn =%y | T = 1} 
and define 
g6(T Q1, X2,-.- Xn)) = Р6{Т(хү,... Xn) = t). 


Thus we see that (1) holds. 
Conversely, suppose that (1) holds. Then for fixed tọ we have 


у, РХ =x} 


x: Т(х)=%ю 


у) gT 


x: Т(х)=% 


= gol) Y^ А0). 
T(x)=% 


Po{T = to} 


1 


Suppose that Pa {T = to} > О for some Ө > 0. Then 


РДХ =x, TŒ) =t) |? if T (x) # to, 
= |] PBX-x . 
Po{T (x) = to} BTE — tg] if T (x) — to. 


Po{X —x| T = t9] = 
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Thus, if T (x) = fo, then 
Po{X = х) ge (to)h(x) 


Po{T(x) = 0} 8000) Уто О) 


which is free of Ө, as asserted. This completes the proof. 


Remark 2. Theorem 1 also holds for the continuous case and, indeed, for quite 
arbitrary families of distributions. The general proof is beyond the scope of this book, 
and we refer the reader to Halmos and Savage [38] or to Lehmann [63, pp. 53—56]. 
We will assume that the result holds for the absolutely continuous case. We leave 
the reader to write the analog of (1) and to prove it, at least under the regularity 
conditions assumed in Theorem 4.4.2. 


Remark 3. Theorem 1 (and its analog for the continuous case) holds if 0 is a 
vector of parameters and T is a multiple RV, and we say that T is jointly sufficient 
for Ө. We emphasize that even if 0 is scalar, T may be multidimensional (Example 9). 
If 0 and Т аге of the same dimension, and if Т is sufficient for Ө, it does not follow 
that the jth component of T is sufficient for the jth component of Ө (Example 8). 
The converse is true under mild conditions (see Fraser [29, p. 21]). 


Remark 4. ТЕТ is sufficient for Ө, any one-to-one function of T is also sufficient. 
This follows from Theorem | since if U = k(T) is a one-to-one function of Т, then 
t = k`! (u), and we can write 


fe(x) = go(t)h(x) = gok (Uuh) = gs Wha). 
If Ti, T» are two distinct sufficient statistics, then 
fex) = go(ti)hi(&) = go(t2)h2(x), 


and it follows that Тү is a function of 72. It does not follow, however, that every 
function of a sufficient statistic is itself sufficient. For example, in sampling from 


К ы А 2.5 >. 
a normal population, X is sufficient for the mean и but X is not. Note that X is 
sufficient for и2. 


Remark 5. As a rule, Theorem 1 cannot be used to show that a given statistic 
T is not sufficient. To do this, one would normally have to use the definition of 
sufficiency. In most cases Theorem 1 will lead to a sufficient statistic if it exists. 


Remark 6. ЇЇ T(X) is sufficient for (Fo: 6 є Ө}, then Т is sufficient for 
{Fo: 0 € о}, where о С Ө. This follows trivially from the definition. 


Example 6. Let X1, X2, ... , Xn be iid b(1, p) RVs. Then T = $e Xi is suf- 
ficient. We have 


РЬ{Хү = x1, X2 = х),...,Х = Xn} = р A — р) LI, 
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and taking 


р Miis 
h(xi, x2,... , Хп) = 1. and gp x2, ... x2) = (1 — р)" (4) . 
we see that Т is sufficient. We note that T1(X) = (X1, X2 + Хз +°- + Xn) and 
Т(Х) = (X1 + X2, Xa, X4 + Xs + --- + Xn) are also sufficient for p, although T 
is preferable to T; or 75. 


Example 7. Let X1, X2,... , Xn be iid RVs with common PMF 


P{X; = к} = k= 12, М: 11.2. 


1 
М + 
Then 


1 
Py[X1 = ki, X2 = k2, ... , Xn = kn} = ўт ifl < kis.. kn < М, 
1 : 
= wie ша; Rivet max ki, N), 
where g(a,b) = 1 ifb > a, and = O if b. < a. It follows, by taking gy [max 


(ki, ..-kn)] = (/N")g(maxizi«, ki, №) and h = (1, minki), that max(X1, Хә, 
... , Xn) is sufficient for the family of joint PMFs Py. 


Example 8. Let X1, X2,... , X, be a sample from N (p, c?), where both u and 
c? are unknown. The joint PDF of (X1, X2,... , Xn) is 


= 1 У – AY 
Sao 0) = OJ exp Et 
Ж Ух? пух пи? 
— (o 2x)" 25 (- 20? i o? 20? |` 





It follows that the statistic 


is jointly sufficient for the parameter (2,07). An equivalent sufficient statistic that 
is frequently used is T; (X1,... , Xn) = (X, 52). Note that X is not sufficient for H 
if o? is unknown, and 52 is not sufficient for o? if u is unknown. If, however, o? is 
known, X is sufficient for sz. If u = ир is known, У(Х; ~ шо)? is sufficient for o?. 
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Example 9. Let Х\, X2,... , Xn be a sample from PDF 
0 0 
€ Adam К Ө 0, 
x | 2 1 > 


1 
foa) = {0' 
0, otherwise. 


The joint PDF of X1, X2,... , X, is given by 


1 
Јо (х1, x2, tee , Xn) = ga lae see Xn), 
where 
6 : 0 
А = { (х1, X2,... , Xn) : 73 < minx; < тах х; < 5 + 


It follows that (X(1), X(n)) is sufficient for Ө. 

We note that the order statistic (X(1), Хо), ..., Xq@)) is also sufficient. Note also 
that the parameter is one-dimensional, the statistics (X(1), Х()) is two-dimensional, 
and the order statistic is n-dimensional. 


In Example 9 we saw that the order statistic is sufficient. This is not a mere coin- 
cidence. In fact, if X = (X1, X2,..., Xn) are exchangeable, the joint PDF of X is a 
symmetric function of its arguments. Thus 


fe. х2,..., Xn) = feGxay XQ), «++ XQ). 
and it follows that the order statistic is sufficient for fg. 


The concept of sufficiency is used frequently with another concept, called com- 
pleteness, which we now define. 


Definition 2. Let {fo(x),@ € ©} be a family of PDFs (or PMFs). We say that 
this family is complete if 


Eog(X) = 0 for all 0 € Ө 
implies that 
Ps(g(X) = 0) = 1 for all Ө є Ө. 


Definition 3. A statistic T (X) is said to be complete if the family of distributions 
of T is complete. 


In Definition 3 X will usually be a multiple RV. The family of distributions of T 
is obtained from the family of distributions of X1, X2, ... , Xn by the usual transfor- 
mation technique discussed in Section 4.4. 
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Example 10. Let X1, X2,...,Xn be iid b(1, p) RVs. Then T = Di X; isa 


sufficient statistic. We show that T is also complete; that is, the family of distributions 
of T, (b(n, р), 0 < p < 1}, is complete. 


Epg(T) = Lof )ra —p)"'=0 forall p є (0, 1) 
t=0 


may be rewritten as 


n t 
й n/p Y. 
(4 — p) 2. «o(7) (52 = -) =0 for all p € (0, 1). 


This is a polynomial in p/(1 — p). Hence the coefficients must vanish, and it follows 
that g(t) = О fort —0,1,2,...,n, as required. 


Example 11. Let X be N (0, 0). Then the family of PDFs (Л/(0, 0), 0 > 0} is not 
complete since E X = О and g(x) = x is not identically zero. Note that T(X) = X 2 
is complete, for the PDF of X? ~ 0 X?(1) is given by 


e 1/28 
——, t>0, 
Ға) = | Vinot 
0, otherwise. 
1 Ps 1/2 ,—t/20 
Eog(T) = -=Í (ги '!^e "^ dt =0 for all Ө > 0, 
£ 276 Jo 


which holds if and only if p? g(D1t71/26-1/9 dt = 0, and using the uniqueness 
property of Laplace transforms, it follows that 


g(t)t 717? =0 for all t > 0, 
that is, g(t) = 0. 


The next example illustrates the existence of a sufficient statistic that is not com- 
plete. 


Example 12. Let X1, X2,... , X, be a sample from Л/(Ө, 02). Then Т = 
(071 Xi, Cf X2) is sufficient for Ө. However, T is not complete since 


n 2 n 
Eo (У) —(n-D5:X2|-0 ган, 
1 1 


and the function g(x1,... , Xn) = 205 xi)? — (n4- 1) Уу” хў is not identically zero. 
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Example 13. Let X ~ U(O, 0), 0 є (0, оо). We show that the family of PDFs of 
X is complete. We need to show that 


Ө 
1 
Еөв(Х) = i 980) dx =0 for all@ > 0 
0 


if and only if g(x) = 0 for all x. In general, this result follows from Lebesgue 
integration theory. If g is continuous, we differentiate both sides in 


6 
Í g(x)dx = 0 
0 


to get g(0) = 0 for all Ө > 0. 
Now let X1, X2, ... , Xn be iid U (0, Ө) RVs. Then the PDF of X(n) is given by 


nO ^^ yn-l, О <х <, 


0, otherwise. 


sein | 


We see by a similar argument that X(n) is complete, which is the same as saying that 
(n(x | 0); 0 > 0} is a complete family of densities. Clearly, Хп) is sufficient. 


Example 14. Let X1, X2, ... , X, be a sample from PMF 


1 
Py(x) = i 


0, otherwise. 


x —1,22.N; 


We first show that the family of PMFs (Py, N > 1] is complete. We have 
1 
Eng(X)= — К) =0 for all N > 1, 
м0) = x 2s ) - 


and this happens if and only if g(k) = 0, k = 1,2,... , №. Next we consider the 
family of PMFs of Xn) = max(X1,... , Xn). The PMF of X(n) is given by 


x"  (x—1»Y 
м М" 





P(x) = eh А 


Also, 


к" (k-11) 


a |=° for all N > 1. 


N 
Eng(Xm) = 98) | 
К=1 


Е (Хоа) = g(1) =0 
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implies that g(1) = 0. Again, 





E2g(X(n)) = £2 + g(2) (: = x) -0 
so that g(2) — O. 

Using an induction argument, we conclude that g(1) = g(2) =--- = g(N) = 0 
and hence g(x) = 0. It follows that py isa complete family of distributions, and 
X(n) is a complete sufficient statistic. 

Now suppose that we exclude the value № = по for some fixed по > 1 from 
the family (Py: N > 1}. Let us write Р = (Py: N > 1, N # no}. Then Р is 
not complete. We ask the reader to show that the class of all functions g such that 
Epg(X) = 0 forall P € P consists of functions of the form 


0, К = 1,2,...,по– 1, по + 2, по +3,..., 
gtk) = }с, К = по, 
=C, k=no+1, 


where c is a constant, c Æ 0. 


Remark 7. Completeness is a property of a family of distributions. In Remark 6 
we saw that if a statistic is sufficient for a class of distributions, it is sufficient for 
any subclass of those distributions. Completeness works in the opposite direction. 
Example 14 shows that the exclusion of even one member from the family (Py: N > 
1} destroys completeness. 


The following result covers a large class of probability distributions for which a 
complete sufficient statistic exists. 


Theorem 2. Let ( fo: 0 € ©} be a k-parameter exponential family given by 


k 
Q) fe (x) = exp |> Q;(0)T;(x) + D(0) + se , 


j=l 


where Ө = (01, 62,... , 9) € Ө, an interval in Rg, Ту, To,... , Tk, and S are defined 
on Ra, T = (Т\, T»,...,Ty), and x = (х1, хә, t Xn), kx n. Let Q = (Qi, Q2, 
. , Qk), and suppose that the range of Q contains an open set in Rg. Then 


T = (Т\(Х), TX), ... , TeX) 
is a complete sufficient statistic. 


Proof. For а complete proof in a general setting, we refer the reader to Lehmann 
[63, pp. 142-143]. Essentially, the unicity of the Laplace transform is used on the 
probability distribution induced by T. We will content ourselves here by proving the 
result for the k — 1 case when fa is a PMF. 
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Let us write Q(0) = 0 in (2), and let (o, 8) C ©. We wish to show that 
Eog(T(X)) = Y ^ g(t) PAT(X) = t) 
t 


(3) = gH explo + D(6) -5*()] - 0 — foralló 
t 


implies that g(t) = 0. 

Let us write xt = x if x > 0, = Oif x < 0, and x^ = ~x if x < 0, = 0 if x > 0. 
Then g(t) = g* (t) — 27 (t), and both g* and g^ are nonnegative functions. In terms 
of g* and g^, (3) is the same as 


(4) gt (tet tO А g (te? * 5 
2. 2, 
for all 6. 
Let 0 € (a, B) be fixed, and write 


gt (eft 5€ 
X gt (tjet tS) 


87 (1)efot-5* €) 
25 g^ (te%ot+S* @)` 


Then both р? and p^ are PMFs, and it follows from (4) that 


(6) dept = У`ейрт(и) 
t t 


6 pta= and p (t)= 


for all ô є (œ — 00, B — 90). By the uniqueness of MGFs (6) implies that 
p'(t-—p' (t) огай; 


and hence that g* (f) = $^ (t) for all т, which is equivalent to g(t) = 0 for all t. 
Since T is clearly sufficient (by the factorization criterion), it is proved that T is a 
complete sufficient statistic. 


Example 15. Let X1, X2, ... , X, be iid N (u,0?) RVs where both u and o? 
are unknown. We know that the family of distributions of X = (X1,..., X4) isa 
two-parameter exponential family with T (X1,..., Xn) = (Ус Xi, У X2). From 
Theorem 2 it follows that T is a complete sufficient statistic. Examples 10 and 11 
fall in the domain of Theorem 2. 


In Examples 6, 8, and 9 we have shown that a given family of probability distri- 
butions that admits a nontrivial sufficient statistic usually admits several sufficient 
statistics. Clearly, we would like to be able to choose the sufficient statistic that re- 
sults in the greatest reduction of data collection. We next study the notion of a min- 
imal sufficient statistic. For this purpose it is convenient to introduce the notion of a 
sufficient partition. The reader will recall that a partition of a space X is just a col- 
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lection of disjoint sets Ey such that $^, E, = X. Any statistic T(X1, X2,... , Xn) 
induces a partition of the space of values of (X1, X2, ... , Xn), that is, T induces a 
covering of X by a family И of disjoint sets А, = ((1,x2,..., Xn) € X: Ty, х2, 

<, Xn) = t}, where t belongs to the range of Т. The sets А, are called partition 
sets. Conversely, given a partition, any assignment of a number to each set so that 
no two partition sets have the same number assigned defines a statistic. Clearly, this 
function is not, in general, unique. 


Definition 4. Let (Fo: 0 є ©} be a family of DFs, and X = (Xi, X2,... , Xn) 
be a sample from Fg. Let И be a partition of the sample space induced by a statistic 
T = T(Xi, X2,... , Xn). We say that = (A, : t is in the range of T] is a sufficient 
partition for Ө (or the family (Fo: 0 є ©}) if the conditional distribution of X, given 
T == t, does not depend on 0 for any A;, provided that the conditional probability is 
well defined. 


Example 16. Let X1, X2,... , Xn beiid b(1, р) RVs. The sample space of values 
of (X1, X2, ... , Xn) is the set of n-tuples (х1, x2, ... , Xn), where each x; = 0 or 
= 1 and consists of 2" points. Let T(X1, X2,... , Хь) = У X;, and consider ће 
partition U = (Ao, A1, ... , An}, where x € A; if and only if Ух; =),0<ў)<л. 


п 
Each А; contains () sample points. The conditional probability 





Р -l 
Рь{х | Aj} = BaD a) жей 


and we see that Я is a sufficient partition. 


Example 17. Let X1, X2,... , X, be iid U[0, 09] RVs. Consider the statistic 
T(X) = тах! <; <п Xi. The space of values of X1, X2, ... , Xn is the set of points 





(x:0zx; 0,i = 1,2,...,n}. T induces a partition У on this set. The sets of 
this partition are А, = [(x1, x2, ... , Xn) : max(xi, ... , Xn) = t}, t € [0,6]. 
We have 
fo(x) | 
fox |2) = if x € A;, 
fa) i 


where / (t) is the PDF of T. We have 


1/0” 1 : 
fex 10 = ru ruga = ami if x € Ay. 


It follows that = (A,] defines a sufficient partition. 


Remark 8. Clearly, a sufficient statistic T for a family of DFs (Fg: 0 є Ө} 
induces a sufficient partition; and conversely, given a sufficient partition, we can 
define a sufficient statistic (not necessarily uniquely) for the family. 
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Remark 9. Two statistics Tj, T? that define the same partition must be in one- 
to-one correspondence, that is, there exists a function й such that Т = h(7>) with 
a unique inverse, T) — АІТ). It follows that if Тү is sufficient, every one-to-one 
function of 7; is also sufficient. 


Let $4, Uy be two partitions of a space X. We say that £l, is a subpartition of U2 
if every partition set in #2 is a union of sets of U. We sometimes say also that £l, 
is finer than Uy (4h is coarser than £11) or that 30 is a reduction of 44. In this case, 
a statistic T» that defines {2 must be a function of any statistic Тү that defines U. 
Clearly, this function need not have a unique inverse unless the two partitions have 
exactly the same partition sets. 

Given a family of distributions ( Fo: 0 € ©} for which a sufficient partition exists, 
we seek to find a sufficient partition Я that is as coarse as possible; that is, any 
reduction of У leads to a partition that is not sufficient. 


Definition 5. A partition 4 is said to be minimal sufficient if 


(i) Wis a sufficient partition, and 
(ii) if C is any sufficient partition, C is a subpartition of И. 


The question of the existence of the minimal partition was settled by Lehmann and 
Scheffé [62] and, in general, involves measure-theoretic considerations. However, 
in the cases that we consider where the sample space is either discrete or a finite- 
dimensional Euclidean space, and the family of distributions of X is defined by a 
family of PDFs (PMFs) { fo, 0 € ©}, such difficulties do not arise. The construction 
may be described as follows. 

Two points x and y in the sample space are said to be likelihood equivalent, and 
we write x ~ y, if and only if there exists a k(y, x) # 0 which does not depend 
on 0 such that fg(y) = k(y, x) fe(x). We leave the reader to check that “~” is an 
equivalence relation (that is, it is reflexive, symmetric, and transitive) and hence “~” 
defines a partition of the sample space. This partition defines the minimal sufficient 
partition. 


Example 18. Consider Example 16 again. Then 


Јр) _ Yxu-X»(1. po brith yi 
FO) р (1—р) А 


and this ratio is independent of р if and only if 


n n 
Усх =, У», 
1 1 


so that x ~ y і and only if Уух; = } yi. It follows that the partition У = 
{Ао, A1, ... , An}, where x € Aj if and only if Ух; = j, introduced in Exam- 
ple 16, is minimal sufficient. 
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A rigorous proof of the assertion above is beyond the scope of this book. The 
basic ideas are outlined in the following theorem. 


Theorem 3. The relation “~” defined above induces a minimal sufficient parti- 
tion. 


Proof. КТ isa sufficient statistic, we have to show that x ~ y whenever T (x) = 
T (y). This will imply that every set of the minimal sufficient partition is a union of 
sets of the form A; = (T = t}, proving condition (ii) of Definition 5. 

Sufficiency of T means that whenever x € A;, then 


_ Fo) 
fà 


is free of 0. It follows that if both x and y € A+, then 


PID _ fo) 
feéylD (у) 


is independent of Ө, and hence x ~ y. 

To prove the sufficiency of the minimal sufficient partition UU, let Т be an RV 
that induces £L. Then Т takes on distinct values over distinct sets of 4 but remains 
constant on the same set. If x є {Т = 1j), then 


foedx | T — 1j ifx € A, 





fo (x) 
7 Taie -2 
(7) fo | Тү = n) РИТ = hi) 
Now 
Pott =n) = | ау or У f. 
(у:Т1(у)=) (y:Ti(y)=t1) 


depending on whether the joint distribution of X is absolutely continuous or discrete. 
Since fo(x)/foe(y) is independent of Ө whenever x ~ y, it follows that the ratio on 
the right-hand side of (7) does not depend on 0. Thus 7; is sufficient. 


Definition 6. A statistic that induces the minimal sufficient partition is called a 
minimal sufficient statistic. 


In view of Theorem 3, a minimal sufficient statistic is a function of every sufficient 
statistic. It follows that if Тү and T? are both minimal sufficient, then both must 
induce the same minimal sufficient partition, and hence 71 and 7? must be equivalent 
in the sense that each must be a function of the other (with probability 1). 

How does one show that a statistic T' is not sufficient for a family of distributions 
Р? Other than using the definition of sufficiency, one can sometimes use a result 
of Lehmann and Scheffé [62] according to which if Tj (X) is sufficient for Ө, 0 є 


372 PARAMETRIC POINT ESTIMATION 


Ө, then 72(X) is also sufficient if and only if T1 (X) = 2(7(Х)) for some Borel- 
measurable function g and all x є B, where В is a Borel set with P9B = 1. 
Another way to prove T nonsufficient is to show that there exist x for which 
T(x) — T(y) but x and y are not likelihood equivalent. We refer to Sampson and 
Spencer [96] for this and similar results. 
The following important result is proved in the next section. 


Theorem 4. A complete sufficient statistic is minimal sufficient. 


We emphasize that the converse is not true. A minimal sufficient statistic may not 
be complete. 


Example 19. Suppose that X ~ U(0,0 + 1). Then X is a minimal sufficient 
statistic. However, X is not complete. Take, for example, g(x) = sin 2z x. Then 


0+1 1 
Eg(X) = f sin 2лх dx = [| sin2zx dx = 0 
ө 0 


for all 6, and it follows that X is not complete. 
If X1, X2,..., Xn is a sample from U (0,0 + 1), then (Xa), Xn) is minimal 
sufficient for 0 but not complete since 


n—1 
=Хау == 
Eo (Xin) ay) ee 


for all Ө. 


Finally, we consider statistics that have distributions free of the parameter(s) @ 
and seem to contain no information about Ө. We will see (Example 23) that such 
statistics can sometimes provide useful information about Ө. 


Definition 7. A statistic A(x) is said to be ancillary if its distribution does not 
depend on the underlying model parameter Ө. 


Example 20. Let Х|, X2,... , Xn be a random sample from N (u, 1). Then the 
statistic A(X) = (n — DS? = Y (X; — X? is ancillary since (n — 1)$2 ~ 
x?(n — 1), which is free of p. Some other ancillary statistics are 


n 
X1— X, Xin) - Xa), and Ух: - XI. 


і=1 


Also, X, a complete sufficient statistic (hence minimal sufficient) for и is indepen- 
dent of A(X). 
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Example 21. Let Xj, X2,... , X, be a random sample from A'(0, o?). Then 

A(X) = X follows (0, n^ !o?) and is not ancillary with respect to the parame- 
2 
(его. 


Example 22. Let Xa), Хо), ..., Xn) be the order statistics of а random 
sample from the PDF f(x — 0), where Ө є R. Then the statistic A(X) = 
(XQ) — Хүр, ... Хп) — Xo) is ancillary for Ө. 


In Example 20 we saw that S? was independent of the minimal sufficient statistic 
X. The following result due to Basu shows that it is not a mere coincidence. 


Theorem 5. If S(X) is a complete sufficient statistic for 0, then any ancillary 
statistic A(X) is independent of S. 


Proof. If A is ancillary, then Ро{А(Х) < a} is free of 0 for all a. Consider the 
conditional probability g4;(s) = P (A(X) < a | S(X) = s}. Clearly, 


Ев{ва($(%))} = PAX) < а). 
Thus 

Eo(8a(S) — P{A(X) x a) =0 
for all 6. By completeness of S it follows that 

Po{ga(S) - P{A <a} = 0} = 1; 
that is, 

Po {A(X) < a | S(X) = s} = P{A(X) < a} 

with probability 1. Hence A and S are independent. 


The converse of Basu's theorem is not true. A statistic S that is independent of 
every ancillary statistic need not be complete (see, for example, Lehmann [60]). 

The following example due to R. A. Fisher shows that if there is no sufficient 
statistic for Ө but there exists a reasonable statistic not independent of an ancil- 
lary statistic A(X), the recovery of information is sometimes helped by the ancillary 
statistic via a conditional analysis. Unfortunately, the lack of uniqueness of ancillary 
statistics creates problems with this conditional analysis. 


Example 23. Let X1, X2,... , X, be a random sample from an exponential 
distribution with mean Ө, апа let Y1, Y2,... , Y, be another random sample from 
an exponential distribution and mean 1/0. Assume that X's and Y's are inde- 
pendent and consider the problem of estimation of Ө based on the observations 
(X1, X2,... , Xn; Yi, Yo, ... , Yn). Let Sx) = rot x; and So(y) = Уа yi- 
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Then (Sı (X), S2(¥)) is jointly sufficient for 0. It is easily seen that (S1, S2) is a 
minimal sufficient statistic for Ө. 
Consider the statistics 





so 
S2(Y) і 


S(X, Y) = | 
and 
A(X, Y) = Sı (X) S5 (Y). 


Then the joint PDF of S and A is given by 





2 " Sy, 6 [AG y]?! 
gl л ( 6 ку S(x,y) 


and it is clear that S and A are not independent. The marginal distribution of A is 
given by the PDF 


C(x, yA, yr", 


where C(x, y) is the constant of integration, which depends only on x, y, and n but 
not on Ө. In fact, C(x, у) = 4Ko[2A(x, PEP, where Ко is the standard form 
of a Bessel function (Watson [115]). Consequently A is ancillary for Ө. 

Clearly, the conditional PDF of S given A = a is of the form 








1 z [ ($2. 0 )| 
2KoDa]sQ.y) P 8 “5(х,у)/]` 


The amount of information lost by using S(X, Y) alone is the [1/(2n + 1)]th part of 
the total, and this loss of information is gained by knowledge of the ancillary statistic 
A(X, Y). These calculations are discussed in Example 8.5.9. 


PROBLEMS 8.3 


1. Find a sufficient statistic in each of the following cases based on a random sam- 
ple of size n: 
(a) X ~ B(a, В) when (i) œ is unknown, В known; (ii) B is unknown, œ known; 
and (iii) œ, В are both unknown. 
(b) X ^ G(a, B) when (i) œ is unknown, В known; (ii) В is unknown, œ known; 
and (iti) a, B are both unknown. 
(c) X ~ Рум, (x), where 
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4. 


S 


Рум (x) = x=N +1, Ni 4+2,..., №, 


№ — №’ 
and №, №(№ < №) are integers, when (i) № is known, N2 unknown; 
(ii) № known, № unknown; and (iii) Ni, N2 are both unknown. 


(d) X ~ Јо(х), where 





et? — if «x«oo, 
Toe | otherwise. 
(e) X ~ f(x; u, с), where 
J€ ) : exp i oss wy]. >0 
X, p.,0)-— —— = > А 
á xa4/2x р 202 8 


(f) X ^ fo(x), where 
fe(x) = Р(Х =x} = c(0)27, x —0,0 1,...,0 0, 
and 
c(8) = 21- V9 (21/9 — 1), 
(g) X ^ Po p(x), where 
Pap = (1 pp, x = 0,041,...,0<p <1, 


when (1) p is known, Ө unknown; (ii) p is unknown, Ө known; and (iii) p, Ө 
are both unknown. 


. Let X = (X1, X2,... , Xn) bea sample from N (ao, о?), where о is а known 


real number. Show that the statistic T(X) = (Y 7 , Xi, У X2) is sufficient 
for o but that the family of distributions of T (X) is not complete. 


. Let X1, X2, ... , Xn be a sample from N (u, о?). Then X = (X1, X2,... , Xn) 


is clearly sufficient for the family N (u, 02), и € R,o > 0. Is the family of 
distributions of X complete? 


Let X1, X2, ... , X, be a sample from U(0 — 4,6 + 1), 6 є R. Show that the 
statistic T (X1, ... , Xn) = (min X;, max X;) is sufficient for Ө but not complete. 


. If T — g(U) and T is sufficient, so is U. 


In Example 14, show that the class of all functions g for which Epg(X) = 0 for 
all P € P consists of functions of the form 
0, k =1,2,...,no—1, no +2, по +3,..., 
gk) = |с, k = no, 


—с, К=по+1, 


where c is a constant. 
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7. For the class {F9,, Fo,] of two DFs where Fo, is МОО, 1) and Fo, is C(1, O), find 
a sufficient statistic. 


8. Consider the class of hypergeometric probability distributions (Pp: D = 0, 1,2, 
..., N}, where 


-1 
рых = х) = (7) OU x = 0, 1,... , min(n, D}. 


Show that it is a complete class. If P = (Pp: D = 0,1,2,...,N, D Æ 
d, d integral 0 < d < N}, is P complete? 


9. Is the family of distributions of the order statistic in sampling from a Poisson 
distribution complete? 


10. Let (X1, X2, ... , Xn) be a random vector of the discrete type. Is the statistic 
T(X1,..., Xn) = (X41,..., X4-1) sufficient? 


11. Let X1, X2, ... , Xn be a random sample from a population with law £(X). Find 
a minimal sufficient statistic in each of the following cases: 
(a) X ~ P(A). 
(b) X ~ UI0,0]. 
(с) X ~ NB(I; p). 
(d) X ~ Py, where Py{X =k} = 1/Nifk = 1,2,..., №, and = 0 otherwise. 
(е) X~N(u, 07). 
(f) X ~ Gla, В). 
(в) X ~ B(a, В). 
(h) X ~ fo(x), where fo(x) = (2/02)(0 —x),0<x < 0. 
12. Let X1, X2 be a sample of size 2 from P(A). Show that the statistic Ху + «X2, 


where œ > 1 is an integer, is not sufficient for A. 


13. Let X1, X2, ... , X, be a sample from the PDF 


Х„—х?/2Ө if 0 
fac) = 10° dd oo 
0 ifx <0 


Show that Ууу X? is a minimal sufficient statistic for Ө, but $7 ., X; is not 
sufficient. 


14. Let X1, X2,... , X, be a sample from N (0, 0”). Show that Viet x? is a mini- 
mal sufficient statistic but Уу X; is not sufficient for o?. 


15. Let X1, X2, ... , X, bea sample from the PDF f, p(x) = Be P 99 if x > о, 
and = O if x < o. Find a minimal sufficient statistic for (a, B). 
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16. Let T be a minimal sufficient statistic. Show that a necessary condition for a 
sufficient statistic U to be complete is that U be minimal. 


17. Let X1, X2,... , Xn be iid Л (и, 02). Show that (X, S?) is independent of each 
of (Xim — Xq)/8. (Ху — X)/S, and 3724 (X41 — Xi)? S. 

18. Let Xi, X2, ... , Xn be iid А/(Ө, 1). Show that a necessary and sufficient condi- 
tion for 5 7. ., aj X; and У? у X; to be independent is У? у а = 0. 

19. Let X1, X2, ... , X, bea random sample from fo(x) = exp[—(x — 0)], x > 0. 
Show that X(1) is a complete sufficient statistic which is independent of S?. 


20. Let X1, X2, ... , Xn be iid RVs with common PDF јо(х) = (1/0) exp(—x/0), 
x > 0, 0 > 0. Show that X must be independent of every scale-invariant statis- 
tic, such as X1/ 5. Ху. 


21. Let Тү, Т be two statistics with common domain D. Then 7; is a function of Т» 
if and only if 


for all x, y € D, T(x) = Т(у) => Р(х) = Т(у). 


22. Let S be ће support of fo, Ө є Ө, and let Т be a statistic such that for 
some 01,0 € Ө, and x, y € S,x # у, T(x) = Т(у) but fo, (x) fa O) £ 
Јо, (х) fo, (У). Then show that Т is not sufficient for 0. 

23. Let X1, X2, ... , Xn be iid A (0, 1). Use the result in Problem 22 to show that 
(X7 Xi)! is not sufficient for Ө. 


24. (a) If T is complete, show that any one-to-one mapping of T is also complete. 


(b) Show with the help of an example that a complete statistic is not unique for 
a family of distributions. 


8.4 UNBIASED ESTIMATION 


In this section we focus attention on the class of unbiased estimators. We develop 
a criterion to check if an unbiased estimator is optimal in this class. Using suffi- 
ciency and completeness, we describe a method of constructing uniformly minimum 
variance unbiased estimators. 


Definition 1. Let (Fo, 0 є ©}, Ө C Ry, be a nonempty set of probability 
distributions. Let X = (X1, X2,... , Xn) bea multiple RV with DF Fg and sample 
space X. Let y : Ө — R be a real-valued parametric function. A Borel-measurable 
function T : X — Ө is said to be unbiased for y if 


(1) EgT(X) = y(0) for all Ө є ©. 


Any parametric function y for which there exists a Т satisfying (1) is called 
an estimable function. An estimator that is not unbiased is called biased, and the 
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function b(T, y), defined by 
(2) b(T, y) = EgT(X) — (0), 
is called the bias of Т. 


Remark 1. Definition 1, in particular, requires that Eg|T| < oo for all 0 є © 
and can be extended to the case when both y and Т are multidimensional. In most 
applications we consider Ө C Ry, ү (Ө) = Ө, and Xi, X2, ... , Xn are iid RVs. 


Example 1. Let X1, X2, ... , Xn bearandom sample from some population with 
finite mean. Then X is unbiased for the population mean. If the population variance 
is finite, the sample variance 5? is unbiased for the population variance. In general, 
if the kth population moment тк exists, the kth sample moment is unbiased for mg. 

Note that 5 is not, in general, unbiased for c. If X1, X2, ... , X, are iid N (и, о?) 
RVs we know that (n — 1)S?/o? is x?(n — 1). Therefore, 


EG ijs f vs : 


20-0/7T[m — 1/2)” 


а o 
e0- rz] T 


The bias of S is given by 


b(S,a) = о TEIG [г ale a. 


We note that b(s, o) — О as n — oo, so that 5 is asymptotically unbiased for o. 


т—1)/2—1„—х/2 ах 


апа 








If T is unbiased for 0, g(T) is not, in general, an unbiased estimator of g (0) unless 
g is a linear function. 


Example 2. Unbiased estimators do not always exist. Consider an RV with PMF 
b(1, p). Suppose that we wish to estimate y(p) = p. Then, in order that 7 be 
unbiased for p?, we must have 


p? = EpT = pT(1) + (1 — p)T (0), О<р<1; 
that is, 


р? = p{T(1) — T(0) + TO) 
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must hold for all p in the interval [0, 1], which is impossible. (If a convergent power 
series vanishes in an open interval, each of the coefficients must be 0. See also Prob- 
lem 1.) 


Example 3. Sometimes an unbiased estimator may be absurd. Let X be P(A) and 
V (A) = e794. We show that T (X) = (—2)* is unbiased for (A). We have 


AS хАх а = C-2A)* А2 
ET (X) =e P uet POL сг Aam — WA). 


x=0 


However, T(x) = (—2)* > 0 if x is even and < Q if x is odd, which is absurd since 


V) > 0. 


Example 4. Let X1, X25, ... , X, be a sample from P(A). Then X is unbiased for 
À and so also is 57, since both the mean and the variance are equal to A. Indeed, 
aX + (1—0)52,0 <æ < 1, is unbiased for A. 


Let 0 be estimable, and let Т be an unbiased estimator of Ө. Let Tj be another 
unbiased estimator of Ө, different from T. This means that there exists at least one 
Ө such that Po(T 3 Tj) > 0. In this case there exist infinitely many unbiased 
estimators of 6 of the form aT + (1 — а) Т, 0 < æ < 1. It is therefore desirable to 
find a procedure to differentiate among these estimators. 


Definition 2. Let 69 є © and U (00) be the class of all unbiased estimators Т of 
09 such that Еф Т? < оо. Then To є U(8o) is called a locally minimum variance 
unbiased estimator (LMVUE) at 6 if 
(3) Eo, (To — 60)? < Eo (T — 69)? 
holds for all Т є (00). 

Definition 3. Let М be the set of all unbiased estimators T of Ө є © such that 
EgT? < оо for all Ө € Ө. An estimator Tọ € U is called a uniformly minimum 
variance unbiased estimator (UMVUE) of Ө if 


(4) Еө(То — 0)" < Eg(T — 0)? 


for all 0 € © and every T є И. 


Remark2. Let a4,a2,... , аһ be any set of real numbers with У? а; = 1. 
Let X1, X2. ... , Xn be independent RVs with common mean и and variances ор, 
К == 1,2,..., n. Then T = a a; Xj is an unbiased estimator of и with variance 


Bd a?o? (see Theorem 4.5.6). T is called a linear unbiased estimator of u. Linear 
unbiased estimators of р. that have minimum variance (among all linear unbiased 
estimators) are called best linear unbiased estimators (BLUEs). In Theorem 4.5.6 
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(Corollary 2) we have shown that if X; are iid RVs with common variance a”, the 
BLUE of p is X = n^! У Xi. If X; are independent with common mean и 
but different variance оў, the BLUE of џ is obtained if we choose a; proportional 
tol Jo? then the minimum variance is H/n, where Н is the harmonic mean of 


c2,... ,о2 (see Example 4.5.4). 


Remark 3. Sometimes the precision of an estimator Т of parameter Ө is mea- 
sured by the mean square error (MSE). We say that an estimator 7o is at least as 
good as any other estimator T in the sense of the MSE if 


(5) Es(To —0) < Eo(T —8)  forallóü e Ө. 


Tn general, a particular estimator will be better than another for some values of 0 and 
worse for others. Definitions 2 and 3 are special cases of this concept if we restrict 
attention to unbiased estimators. 


The following result gives a necessary and sufficient condition for an unbiased 
estimator to be a UMVUE. 


Theorem 1. Let? be the class of all unbiased estimators T of a parameter 0 € Ө 
with EgT” < оо for all Ө, and suppose that U is nonempty. Let Uo be the class of all 
unbiased estimators v of O, that is, 


U = (v: Eau = 0, Еөъ? < œ for all 0 є O}. 
Then Tọ € U is a UMVUE if and only if 
(6) Eg(vTo) = 0 for all Ө and all v € Up. 


Proof. The conditions of the theorem guarantee the existence of Eg (®Те) for all 
Ө and v € Up. Suppose that То € U isa UMVUE and Ea, (voTo) x О for some бу and 
some vo € Up. Then Тә + Avo € U for all real A. If Ea, va = 0, then Ea, (voTo) = 0 
must hold since Ра, {vo = 0) = 1. Let Epu > 0. Choose ào = — Ea, (Tovo)/ Ea, và 
Then 


Ед (voTo) 
(7) Eg (To + лот)? = Ea TS — X—— < Ea Ti. 
Evo 


Since To + Agvo € U and Т € U, it follows from (7) that 
(8) vara, (To + Avo) < vara, (To), 


which is a contradiction. It follows that (6) holds. 
Conversely, let (6) hold for some To € 4, all 0 € Ө and all v € Up, and let T € U. 
Then То — T € Uo, and for every Ө, 
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Ee(To(To — T)} = 0. 
We have 
ЕөТ2 = Еө(Т To) < (Eo T2) " (Ey T2)? 


by the Cauchy—Schwarz inequality. If Eg T2 = 0, then Р(70 = 0) = 1 and there is 
nothing to prove. Otherwise, 


(ЕөТФ)!/? < (go T2)? 
or varg(To) x varg(T). Since T is arbitrary, the proof is complete. 


Theorem 2. Let U be the nonempty class of unbiased estimators as defined in 
Theorem 1. Then there exists at most one UMVUE for 0. 


Proof. If T and To € И are both UMVUEs, then T — To € Up and 
Eo{To(T — To)} = 0 for all 0 € ©, 
that is, ЕөТ2 = Eg(T To), and it follows that 
cov(T, To) = varo (To) for all 8. 
Since 70 and Т are both UMVUESs, varg(T) = varg(To), and it follows that the 
correlation coefficient between Т and То is 1. This implies that Ро{аТ +bTo = 0} = 


1 for some a, b and all Ө € ©. Since Т and Tp are both unbiased for Ө, we must have 
Po{T = To} = 1 for all Ө. 


Remark 4. Both Theorems 1 and 2 have analogs for LMVUE’s at бу € ©, 00 
fixed. 


Theorem 3. If UMVUEs T; exist for real functions y;, i = 1,2, of Ө, they also 
exist for Aw; (A real), as well as for yr + Y2, and are given by AT; and Ту + Т», 
respectively. 


Theorem 4. Let {Т„} be a sequence of UMVUEs and Т be a statistic with 
ET? < оо and such that E9{T, — T? — Oas n — oo forall Ө € ©. Then T is 
also the UMVUE. 


Proof. That T is unbiased follows from |EgT —0| < Eg|T — Т. < E Т, — 
Т)?. For all v € Uo, all 0, and every n = 1, 2,..., 


Eo (Tav) =0 


by Theorem 1. Therefore, 
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Eg(vT) = Eg(vT) — Eg(vT;) 
= Eo[v(T — Т,)] 
and 
|E9(vT)| € (Epv’)/?[E9(T — T]? +0 азп-—> оо 
for all Ө and all v € М. Thus 
Eg(vT) = 0 forall v € o, all0 € Ө, 
and by Theorem 1, T must be the UMVUE. 


Example 5. Let Ху, X2,... , X, be iid P(A). Then X is the UMVUE of A. 
Surely, X is unbiased. Let g be an unbiased estimator of 0. Then T(X) = X + g(X) 
is unbiased for Ө. But X is complete. It follows that 


E,g(X)=0 = forallA>O=> g(x) = 0 forx =0,1,2,.... 
Hence X must be the UMVUE of A. 


Example 6. Sometimes an estimator with larger variance may be preferable. 

Let X bea G(1, 1/8) RV. X is usually taken as a good model to describe the time 
to failure of a piece of equipment. Let X1, X2, ... , X, be a sample of n observations 
on X. Then X is unbiased for EX = 1/8 with variance 1/(nf?). (X is actually 
the UMVUE for 1/8.) Now consider Ху = min(X1, X2, ... , Xn). Then nX(y) is 
unbiased for 1/8 with variance 1/82, and it has a larger variance than X. However, 
if the length of time is of importance, л X (1) may be preferable to X, since to observe 
nX(1) one needs to wait only until the first piece of equipment fails, whereas to 


compute X one would have to wait until all the n observations X1, X2,... , Xn are 
available. 
Theorem 5. If a sample consists of n independent observations X1, X2, ... , Xn 


from the same distribution, the UMVUE, if it exists, is a symmetric function of the 
X i "s. 


The proof is left as an exercise. 

The converse of Theorem 5 is not true. If X1, X2,... , X, are iid P(A) RVs, 
А > 0, both X and 52 are unbiased for 0. But X is ће UMVUE, whereas 52 is not. 

We now turn our attention to some methods for finding UMVUEs. 


Theorem 6 (Blackwell [9], Rao [85]). Let (Fo: 0 € ©} bea family of probability 
DFs and h be any statistic in U, where U is the (nonempty) class of all unbiased 
estimators of Ө with Egh? < oo. Let Т be a sufficient statistic for (Fo,0 є GJ. 
Then the conditional expectation Eg(h | T) is independent of Ө and is an unbiased 
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estimator of Ө. Moreover, 
(9) Eg(E{h | T) — 0)? < Eg(h — ө)? for all 9 є Ө. 


The equality in (9) holds if and only if h = E{h | T} (that is, Po{h = Е{А | T} = 1 
for all 0). 


Proof. We һауе 
Eg(E(h | T} = Egh = Ө. 
It is therefore sufficient to show that 
(10) Eo{E{h| T? < Eogh? foralló € ©. 
But Egh? = Eg(E(h? | T}, so that it will be sufficient to show that 
an [E{h | T) < E{h | T). 
By the Cauchy-Schwarz inequality 
E*(h | T} < EU! |Т}Е{1 | Т}, 
and (11) follows. The equality holds in (9) if and only if 
(12) Eo{E{h | T}? = Eoh’, 
that is, 
EolE{h? | T) — Еһ | T] = 0, 
which is the same as 
Eo{var{h | T} = 0. 
This happens if and only if var{h | T} = 0, that is, if and only if 
Eth? | T} = ЕА | T), 


as will be the case if and only if h is a function of T. Thus h = E(h | T} with 
probability 1. 


Theorem 6 is applied along with completeness to yield the following result. 
Theorem 7 (Lehmann-Scheffé [62]). If T is a complete sufficient statistic and 


there exists an unbiased estimator h of 0, there exists a unique UMVUE of 6, which 
is given by E(h | T]. 
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Proof. Y hi, hz € U, then E{h, | T) and E{h2 | T} are both unbiased and 
Eg|Elhi | T) - E{h2 | TH =0 for all 0 є Ө. 


Since T is a complete sufficient statistic, it follows that E {h1 | T) = E(h2 | T). By 
Theorem 6 E(h | T) is the UMVUE. 


Remark 5. According to Theorem 6, we should restrict our search to Borel- 
measurable functions of a sufficient statistic (whenever it exists). According to The- 
orem 7, if a complete sufficient statistic T exists, all we need to do is to find a Borel- 
measurable function of T that is unbiased. If a complete sufficient statistic does not 
exist, an UMVUE may still exist (see Example 11). 


Example 7. Let X1, X2, ... , X, be N(0, 1). X; is unbiased for 0. However, 


Х =п-! У Х; is a complete sufficient statistic, so that | E{X, | X} is the UMVUE. 
We will show that E{X; | X] = X. Let Y = nX. Then Y is AN (n0,n), Xi 
is N(@, 1), and (X1, Y) is a bivariate normal RV with variance covariance matrix 


І E . Therefore, 


1 
Xı, Y 
E(Xi |y = Exi + ER — BY) 
gg cud. 
n n 


as asserted. E 

If we let ү (Ө) = 02, we can show similarly that X“ — 1/n is the UMVUE for 
v (0). Note that x — 1/n may occasionally be negative, so that an UMVUE for 8? 
is not very sensible in this case. 


Example 8. Let X1, X2, ... , X, be iid b(1, p) RVs. Then T = Y X; is a com- 
plete sufficient statistic. The UMVUE for p is clearly X. To find the UMVUE for 
V(p) = р(1 — р), we have E(nT) = n?p, ET? = np + n(n — 1)р?, so that 
E(nT — Т2} = n(n — 1)р(1 — p), and it follows that (nT — T?)/n(n — 1) is the 
UMVUE for y (p) = р(1 — p). 


Example 9. Let X1, X2,... , X, be a sample from A (u, o?). Then (X, 52) 
is a complete sufficient statistic for (4, c?). X is the UMVUE for и, and 5? is the 
UMVUE for o°. Also, k(n)S is the UMVUE foro, where k(n) = (n — 1/2T[(n— 
1)/2]/ Г (1/2). We wish to find the UMVUE for the pth quantile 3,,. We have 


pz PX s e p |z <="), 
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where Z is А/(О, 1). Thus 3p = с.р + и, and the UMVUE is 
T(Xi, Х2,..., Ха) = zi -pk(n)S + X. 
Example 10 (Stigler [109]). We return to Example 8.3.14. We have seen that the 
family ( PO? N > 1} of PMFs of Xn) = maxi<j<n Xi is complete and X(n) is 


sufficient for У. Now EX, = (№ + 1)/2, so that T(X1) = 2X, — 1 is unbiased for 
N. It follows from Theorem 7 that E(T (X1) | Xin} is the UMVUE of М. We have 


RR za п—1 
T C ifx; =1,2,...,y—1, 
P{X1 = x1 | Xin = у} = m 
усулу MAR 
Thus 
—( —1)"7 1 y-i 
E(T(X1) | Xm = у} = — Tona- 
x=l 
у"! 
2у—1)————— 
+ (y — 1) у (усту 
yr —(у— jr 
|y -6GQ-1" 
is the UMVUE of N. 


If we consider the family P instead, we have seen (Example 8.3.14 and Prob- 
lem 8.3.6) that P is not complete. The UMVUE for the family (Py: N > 1} is 
T(X1) = 2X, — 1, which is not the UMVUE for P. The UMVUE for Р is, in fact, 
given by 


n- 2k — 1, kxno, kAnot+], 
2no, k=no, k=notl. 

The reader is asked to check that 7; has covariance 0 with all unbiased estimators g 

of 0 that are of the form described in Example 8.3.14 and Problem 8.3.6, and hence 

Theorem 1 implies that 7; is the UMVUE. Actually, T1 (X1) is a complete sufficient 

statistic for P. Since En, Tı (X1) = no + 1/no, Тү is not even unbiased for the family 

(Py: N > 1). The minimum variance is given by 


vary (T(X1)) if N < ng, 


ТІ(Х1)) = 
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The following example shows that a UMVUE may exist, whereas a minimal suf- 
ficient statistic may not. 


Example 11. Let X be an RV with PMF 
Р(Х = —1) =0 and Р(Х =x) = (1—6@)?Ө*, 
х= 0, 1,2, ..., мћеге0 « 0 < 1. Let y (0) = Р(Х = 0) = (1 — 0)2. Then X is 


clearly sufficient, in fact minimal sufficient, for 0 but since 


со 
EX = (—1)0 + У x(1 — 0)20* 
х=0 


2 4 z 
= -0 +001 -0}& V g* =0, 
+0( P» 


it follows that X is not complete for {Pg : 0 < Ө < 1]. We will use Theorem 1 to 
check if a UMVUE for y/(0) exists. Suppose that 


oo 
Egh(X) = h(-6 + У — 6)'6*h(x) = 0 
х=0 
for allO < 9 < 1. Then, for0 <6 < 1, 
oo oo oo 
0—6h(-1)-- oha) –2 У 6 ho) + >" h(x) 
x=0 x=0 х=0 


= h(0) + amie: + D-2h(x) + h(x — 0] 
x-0 


which is a power series in 0. 
It follows that Л(0) = 0, and for x > 1, k(x + 1) — 2A(x) + h(x — 1) = 0. Thus 


h(1) = h(—1), h(2) = 2h(1) — ACO) = 2һ(—1), 
h(3) = 2h) — h(1) = 4h(—1) — h(—-1) = 3h(—1), 
and so on. Consequently, all unbiased estimators of zero are of the form h(X) = cX. 
Clearly, T(X) = 1 if X = 0, and = 0 otherwise is unbiased for y; (0). Moreover, for 
all Ө, 
E{cX - T(X)) = 9, 
so that Т is UMVUE of y (0). 


We conclude this section with a proof of Theorem 8.3.4. 
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Theorem 8. (Theorem 8.3.4) A complete sufficient statistic is a minimal suffi- 
cient statistic. 


Proof. Let S(X) be a complete sufficient statistic for ( fo : 9 є ©} and let T be 
any statistic for which Eg|T?| < оо. Writing A(S) = Eg(T|S], we see that A is the 
UMVUE of EọT. Let 5\(X) be another sufficient statistic. We show that A(S) is а 
function of Sı. If not, then А (51) = Eg{h(S)|S,} is unbiased for Eg T and by the 
Rao-- Blackwell theorem, 


varg h1(5;) < varto h(S), 


contradicting the fact that ^ (5) is UMVUE for ET. It follows that h(S) is a function 
of Sı. Since h and S; are arbitrary, S must be a function of every sufficient statistic 
and hence, minimal sufficient. 


PROBLEMS 8.4 


1. Let X4, X2, ... , X4 (n > 2) be a sample from b(1, p). Find an unbiased estima- 
tor for v (p) = р?. 


2. Let X1, X2,...,Xn(n > 2) be a sample from N (p, o”). Find an unbiased 
estimator for a ?, where p +n > 1. Find a minimum MSE estimator of o^. 


3. Let X1, X2, ... , X, be iid N (u, o?) RVs. Find a minimum MSE estimator of 
the form 0:52 for the parameter o?. Compare the variances of the minimum MSE 
estimator and the obvious estimator 5?. 


4. Let X ~ b(1, 62). Does there exist an unbiased estimator of 0? 
5. Let X ~ P(A). Does there exist an unbiased estimator of Y (A) = A7!? 


6. Let X1, X2, ... , Xn be a sample from b(1, p), 0 < p < 1,and0 < s < n be an 
integer. Find the UMVUE for (a) V (p) = p°, апа (b) y (p) = р? + (1 — р)”. 


7. Let X1, X2, ... , X, be a sample from a population with mean Ө and finite vari- 
ance, and Т be an estimator of 0 of the form T(Xi, X2,... , Xn) = Diet aj Xj. 
If T is an unbiased estimator of Ө that has minimum variance and T” is another 
linear unbiased estimator of Ө, then 


cove(T, T^) = vare (T). 


8. Let 7i, 72 be two unbiased estimators having common variance ас?(оа > 1), 
where o? is the variance of the UMVUE. Show that the correlation coefficient 
between Тү and Т) is > (2 — a)/a. 


9. Let X ~ NB(1;0) and d(0) = %{Х = 0}. Let X1, X2,... , X, bea sample 
on X. Find the UMVUE of d(0). 
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10. This example covers most discrete distributions. Let X1, X2, ... , X, be a sam- 
ple from the PMF 
«(х)Ө* 
Р(Х =x} = 5 х= 0,1,2,..., 
f(8) 


where Ө > 0, a(x) > 0, f(0) = 29 9 4 07, (0) = 1, and let T = X, + 
X2 ^ 4 Xn. Write 


c(t, n) = у [Тео 


ХІ.Х2.....Хп i=l 
n 
with ул = t. 
iz] 


Show that T is a complete sufficient statistic for 0 and that the UMVUE for 
d(0) = Ө" (r > Ois an integer) is given by 


0 ift <r. 
Y, (t) = { c(t —r,n) 


ift >r. 
c(t, п) 


(Roy and Mitra [92]) 
11. Let X be a hypergeometric RV with PMF 


маа (3) 6-2) 


where max(0, M +n — № < x € тіп(М, п). 
(a) Find the UMVUE for M when N is assumed to be known. 
(b) Does there exist an unbiased estimator of N (M known)? 


12. Let X1, X2, ... , Xn beiid G(1, 1/A) RVs A > 0. Find the UMVUE of Py, {X1 < 
to}, where у > O is a fixed real number. 


13. Let X4, X2, ... , Xn be a random sample from P(A). Let (A) = Y 29 cx* 
be a parametric function. Find the UMVUE for ү(А). In particular, find the 
UMVUE for (a) W (A) = 1/(1 — А), (b) v (À) = X for some fixed integer s > 0, 
(c) YA) = P {X = 0), and (d) YA) = РХ = Oor 1]. 


14. Let X1, X2, ... , X, bea sample from the PMF 


1 
Py) = ту› x=1,2,...,N. 


Let үг (N) be a function of N. Find the UMVUE of v (N). 
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15. Let Ху, X2,..., X, be a random sample from P(A). Find the UMVUE of 
V (A) = Р,{Х = k}, where k is a fixed positive integer. 


16. Let (X1, Yı), (X2, №),... , (Xn, Yn) be a sample from a bivariate normal pop- 
ulation with parameters 41, H2, сү, оў, and p. Assume that uj = ио = fh, 
and it is required to find an unbiased estimator of д. Since a complete sufficient 
statistic does not exist, consider the class of all linear unbiased estimators 


f(a) =aX+(1—a)¥. 


(a) Find the variance of À. 
(b) Choose œ = ag to minimize var({z), and consider the estimator 


йо = ao X + (1 — о0)У. 


Compute var(jio). If o1 = o», the BLUE of u (in the sense of minimum 
variance) is 


*i 


zoo X+ 
йл=——— 


2 


irrespective of whether c, and p are known or unknown. 


(с) If oi Æ о» and р, сі, оз are unknown, replace these values in ap by their 
corresponding estimators. Let 

UR 52 — Si 

S? + 52—251 


Show that 
fi2 =Y¥ (X – Ү)ё 
is an unbiased estimator of u. 


17. Let X1, X2, ... , X, be iid N (0, 1). Let р = Ф(х — 0), where Ф is the DF of a 
N (0, 1) RV. Show that the UMVUE of p is given by Ф (G — X) Jn/(n — 1)). 


18. Prove Theorem 5. 


19. In Example 10 show that 7; is the UMVUE for N (restricted to the family P), 
and compute the minimum variance. 


20. Let (X1, Y1), ... , (Xn, Yn) be a sample from a bivariate population with finite 


variances оў апа o2, respectively, and covariance y. Show that 


1 n—2 0202 
var(Si1) = = С - — y +14 ) 





n—1 п— 1 
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21. 


22. 


23. 


24. 


25. 


26. 
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where un = E[(X — EX) (Y ~ EY)?]. It is assumed that appropriate order 
moments exist. 


Suppose that a random sample is taken on (X, Y) and it is desired to estimate y, 
the unknown covariance between X and У. Suppose that for some reason a set 5 
of n observations is available on both X and Y, an additional n, — п observations 
are available on X but the corresponding Y values are missing, and an additional 
n2 — n observations of Y are available for which the X values are missing. Let 
S be the set of all n; (7 п) X values, and 52, the set of all n2(> п) Y values, 
and write 





X= Уез, Д v= L jes cd x- dies Xi ү = Lies Mi 
ny п? п п 
Show that 
Ж njn2 v T 
= erts X; — XXY; – Y 
? третата i — X), – Y) 


ieS 


is an unbiased estimator of у. Find the variance of ӯ, and show that var(y) < 
уаг(,51 1), where 51 is the usual unbiased estimator of y based on the n observa- 
tions in $. (Boas [10]) 


Let X1, X5, ... , X, be iid with common PDF fo(x) = exp(—x + 0), x > Ө. 
Let хо be a fixed real number. Find the UMVUE of f@ (xo). 


Let Ху, X2,... , X, be iid Л(и, 1) RVs. Let T(X) = У? | Xi. Show that 
ф(х; t/n, n — 1/n) is the UMVUE of ф(х; u, 1) where ф(х; и, c?) is the PDF 
of a N (u, o?) RV. 


Let X1, X2, ... , Xn be iid G(1, 0) RVs. Show that the UMVUE of f(x; 0) = 
(1/0) exp(—x/0), x > 0, is given by h(x|t) the conditional PDF of X; given 
T(X) = Yo. Xi = t, where 


h(xlt) = (n — D(t —2) 72/17 — forx «tand =Oforx >t. 
Let X1, X2, ... , Xn be iid RVs with common PDF fg(x) = 1/(20), |x| < 6, 


and = 0 elsewhere. Show that T (X) = max(—X(1), X(n}} is a complete suffi- 
cient statistic for Ө. Find the UMVUE of 0”. 


Let X1, X2, ... , X, bea random sample from the PDF 
1 = 
fats) = exp |-S—" . x>p, o>0 
а а 


where 0 = (p, с). 
(а) Show that (x a» 5 (X-X o) is a complete sufficient statistic for 0. 
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(b) Show that the UMVUEs of р and с are given by 





n-i 


A х у 
й = Xm ——~ ) (X; - Хо), ê = у (Xj - Xo). 
n(n — 1) 1 rum 


А 


(c) Find the UMVUE of y (u, 0) = Epo Xi. 
(d) Show that the UMVUE of Ро {Хү > t) is given by 


n—2 
А _п— 1 E t— Ха) + 
ак IĮ: У(Х) al | 


where x * — max(x, 0). 


8.5 UNBIASED ESTIMATION (CONTINUED): LOWER BOUND FOR 
THE VARIANCE OF AN ESTIMATOR 


In this section we consider two inequalities, each of which provides a lower bound 
for the variance of an estimator. These inequalities can sometimes be used to show 
that an unbiased estimator is the UMVUE. We first consider an inequality due to 
Fréchet, Cramér, and Rao (the FCR inequality). 


Theorem 1 (Cramér [17], Fréchet [31], Rao [84]). Let € С R be an open in- 
terval and suppose that the family {fọ : 0 € ©} satisfies the following regularity 
conditions: 


(i) It has common support set S. Thus 5 = (x: fe(x) > 0} does not depend 
on Ө. 
9 
(ii) For x є S and 0 є Ө, the derivative $6 log fo (x) exists and is finite. 


(iii) For any statistic h with Eg|h(X)| < oo for all Ө, the operations of integration 
(summation) and differentiation with respect to 0 can be interchanged in 
Egh(X). That is, 


() <А h(x) fo(x) dx = / һо) 2. fo) dx 


whenever the right-hand side of (1) is finite. 
Let T (X) be such that vate T(X) < oo for all 0 and set (0) = EoT(X). if 
a 
I(0) = Ee E log fo 00 | satisfies 0 < /(0) < оо, then 


[y 6)? 


(2) vare T (X) > 109) 
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Proof. Since (iii) holds for h = 1, we get 


д 
(3) 0 = А эв /^ 9 dx 


д 
= Í [ж log ло) fo(x) dx 
= Ё 2 log fo (X) 
= 90 g ЈӨ . 
Differentiating (0) = EọT (X) and using (1), we get 
a 
(4) y'(0) = / T 0035 fo) dx 
5 
д 
= f Ires; log в] fo GO dx 
$ 
= T(X а і x 
= cov | T( ) 59 og fo(X) |. 
Also, in view of (3), we have 
2 | (X)| =E 2 l (X) І 
vate | 57 og fo = Бө |= og fo : 
and using Cauchy-Schwarz inequality in (4), we get 
3 2 
[V (0)? < vare T (X) Eo [ж log л | 


which proves (2). Practically the same proof тау be given when fo is а PMF by 
replacing f by =. 


Remark 1. If, in particular, Y (Ө) = 6, then (2) reduces to 
1 
(5) varo (T (X)) > Ty 


Remark2. Let Х|, X2,... , Xn be iid RVs with common PDF (PMF) јо (х). 
Then 


il 


2 n A32 
5 s, | 208 S000) = ув [гы] 


00 l 90 
i=l 


li 


д1 X 
nEo| a 1) 


2 
| =n (0), 
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where /1(Ө) = Eg[8 log fe(X1)/ 90]2. In this case the inequality (2) reduces to 


[y (P 


vare (T (X)) > nh) 





Definition 1. The quantity 


à log Aon] 


(6) 10) = 5| 30 


is called Fisher information in Ху and 


д log fo (X) 


2 
30 | = п11(0) 


(7) 1,6) = Eo | 
is known as Fisher information іп the random sample Х|, X2, ... , Xn. 


Remark 3. As п gets larger, the lower bound for vare (T (X)) gets smaller. Thus, 
as the Fisher information increases, the lower bound decreases and the "best" esti- 
mator [one for which equality holds in (2)] will have smaller variance, consequently 
more information about 0. 


Remark 4. Regularity condition (i) is unnecessarily restrictive. An examination 
of the proof shows that it is only necessary that (ii) and (iii) hold for (2) to hold. 
Condition (i) excludes distributions such as fọ(x) = 1/0, 0 < x < 0, for which 
(3) fails to hold. It also excludes densities such as fa(x) = 1,0 «x «0 +1, ог 
fo(x) = Q/n) sin?(x + л),0 x x < 0 + л, each of which satisfies (iii) for h = 1, 
so that (3) holds but not (1) for all л with Eg|A| < oo. 


Remark 5. Sufficient conditions for regularity condition (iii) may be found in 
most calculus textbooks. For example, if (i) and (ii) hold, then (iii) holds provided 
that for all A with Eg|h| < oo for all 0 € Ө, both Ee(A(X)[8 log fo(X)/30]) and 
Eg |h (X)[3fo (X)/90]] are continuous functions of 0. Regularity conditions (i) to (iii) 
are satisfied for a one-parameter exponential family. 


Remark 6. Тһе inequality (2) holds trivially if /(0) = oo [and (Ө) is finite] 
or if varg(T (X)) = oo. 


Example 1. Let X ~ b(n, р); © = (0, 1) C R. Here the Fisher information may 
be obtained as follows: 


log fp(x) = log (") + x log p + (n — x) log(1 — p), 


ð log fp) пх 


x 
др p 1—р 
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and 


a log | п 
Бор EU mec TY 
; | Әр pic 


Let y (p) be a function of p and T (X) be an unbiased estimator of (р). The only 
condition that need be checked is differentiability under the summation sign. We 
have 


n п _ 

W(p) = EpT(X) = У) ( ) T(x)p*(1 — py", 
x=0 2: 

which is a polynomial in p and hence can be differentiated with respect to p. For any 


unbiased estimator T (X) of p, we have 


1 1 
varg(T(X)) > РО = р) = Toy’ 


and since 


(=)-2 p= p) 
VEL uve ee 
n n n 


it follows that the variance of the estimator X /n attains the lower bound of the FCR 
inequality, and hence T (X) has least variance among all unbiased estimators of p. 
Thus 7 (X) is the UMVUE for p. 


Example 2. Let X ~ P(A). We leave the reader to check that the regularity con- 
ditions are satisfied and 


var, (T(X)) > А. 


Since T(X) — X has variance A, X is the UMVUE of 4. Similarly, if we take a 
sample of size n from P (A), we can show that 


ҺО) = and маг, (T(Xj,... , Xn) > 


шр» 


п 
A 


and X is the UMVUE. 
Let us next consider the problem of unbiased estimation of (A) = e^, based 
on a sample of size 1. The estimator 


1 if X = 0, 


д (X) = 
uL | if X >1, 
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is unbiased for (А) since 
Ед (X) = Ед (Х)] = РХ = 0} = e 
Also, 
var, (д (X)) =e *(1 —e™). 
To compute the FCR lower bound, we have 
log fx (х) = x loga — X — logx!. 


This has to be differentiated with respect to e^, since we want a lower bound for an 
estimator of the parameter е^. Let 0 = e~*. Then 


1 
log fo(x) = x log log = + logó — log x!, 








д 1 1 
96 EPO = 15128 * 8" 
апа 
a a s 1 1 1\? 
— 1 »| = —— 1 log — 
„он = ffi ы Бшш 
"d risen] 
ъз ск e 
=> ), 
so that 
А 1 
vare T(X) > х= Ie 
where Ө = e^. 


Since e~*(1 — e~*) > Ae~** for A > 0, we see that var(5(X)) is greater than the 
lower bound obtained from the FCR inequality. We show next that 5(X) is the only 
unbiased estimator of 0, and hence is the UMVUE. 

If h is any unbiased estimator of 0, it must satisfy Egh(X) — 0. That is, for all 
А > 0, 


оо ak 
A = 3 nd9e ^. 
z k! 
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Equating coefficients of powers of А we see immediately that h(0) = 1 and h(k) = 0 
fork = 1,2,.... It follows that h(X) = 0(X). 

The same computation can be carried out when X1, X2,..., X, is a random 
sample from P(A). We leave the reader to show that the FCR lower bound for any 
unbiased estimator of Ө = e^^ is Ae 7^ /п. The estimator У у 9(X;)/n is clearly 
unbiased for е^ with variance e^ (1 — e~*)/n > (Ае ?^)/n. The UMVUE of e^^ 
is given by Ту = [(n — 1)/n]2i=1 Xi with var, (То) = e7? (e™" — 1) > (4e72)/n 
for all A > O. 


Corollary. Let X1, X2, ... , X, be iid with common PDF fe(x). Suppose that 


the family (fo : 0 € Ө) satisfies the conditions of Theorem 1. Then equality holds 
in (2) if and only if for all Ө € ©, 


i д 
(8) Т(х) — №00) = k(6)55 log fo(x) 
for some function k(@). 


Proof. Recall that we derived (2) by an application of the Cauchy-Schwatz in- 
equality where equality holds if and only if (8) holds. 


Remark 7. Integrating (8) with respect to 0, we get 
log fo(x) = Q(@)T (x) + 5(0) + AGO 


for some functions О, S, and A. It follows that fo is a one-parameter exponential 
family and the statistic T is sufficient for Ө. 


Remark 8. А result that simplifies computations is the following. If fg is twice 
д 
differentiable and Eg E log fo œ] can be differentiated under the expectation 


sign, then 


a ? 92 
(9) I (0) = Eo PI D = —Eg Е log fo(X) |. 


For the proof of (9), it is straightforward to check that 





foo [а А 
5G E ов 509) : 


9? 
38? Іор fo(x) = 
Taking expectations on both sides we get (9). 


Example 3. Let X1, X2, ... , Xn be iid A (ji, 1). Then 
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1 _ 2 
log fu(x) = => log(2x) — aoe 


a 
an log fu(x) =x — и, 


and 
a2 
ae log f,(x) = —1. 
Hence 7 (џи) = 1 and J, (и) = n. 
We next consider an inequality due to Chapman, Robbins, and Kiefer (the CRK 


inequality) that gives a lower bound for the variance of an estimator but does not 
require regularity conditions of the Fréchet-Cramér-Rao type. 


Theorem 2 (Chapman and Robbins [11], Kiefer [50]). Let € C R and { fọ (х) : 
Ө є Ө} be a class of PDFs (PMFs). Let y be defined on Ө, and let T be an unbiased 
estimator of ү (Ө) with EgT? < oo for all є ©. 1f6 Æ p, assume that fo and fy 
are different and assume further that there exists a o € © such that 6 5 ф and 


(10) 5(0) = (fe() > 0} D Sle) = (fo G0 > 0}. 


Then 


[V (9) — v (P 
is T(X)) > varol fo (X)/fo(X)] 
1) AUCA reac ues өзө Vatol fo QO/fo Q0] 


for all 0 є ©. 


Proof. Since T is unbiased for y, E,T(X) = v (qo) for all ф є ©. Hence, for 
9 * 0, 





(12) [ тох) — fe „е = рор) — ¥66), 
5(0) fo(x) 
which yields 
RO Aion 
cove [rœ PES | =(ф) — y (Ө). 


Using the Cauchy~Schwarz inequality, we get 





2 fO f, OO 
cov [7х fox) | < varg (T (X)) varo [2% — ] 
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= varg(T (X)) vato | : 


Р(Х) 
Thus 


[v (v) — ver 
vata { fo (X)/ fa XF 


and the result follows. In the discrete case it is necessary only to replace the integral 
in the left side of (12) by a sum. The rest of the proof needs no change. 


vare (T (X)) > 


Remark 9. Inequality (11) holds without any regularity conditions on fọ or 
v (Ө). We will show that it covers some nonregular cases of the FCR inequality. 
Sometimes (11) is available in an alternative form. Let Ө and 0 + 8(8 # 0) be any 
two distinct values in © such that 5(0 + 8) C S(0), and take V (0) = 0. Write 


1 x)Y 


Then (11) can be written as 


1 
(13) varg(T (X)) > БӨЛ” 


where the infimum is taken over all ô # 0 such that 5(0 + 8) C 5(0). 


Remark 10. Inequality (11) applies if the parameter space is discrete, but the 
Fréchet-Cramér-Rao regularity conditions do not hold in that case. 


Example 4. Let X be U[0, Ө]. The regularity conditions of FCR inequality do not 
hold in this case. Let Y (8) = 0. If o < Ө, then 5(ф) C S(0). Also, 


pof ik (5) 1 Ө 
Е = —| -ах=—. 
|e oW Ө 





Thus 


(ф— 0)? ө? 
T(X a as Em ae ee 
vare (T (X)) > bia 6/9) —1 ae Ф(0 — 9) 4 


for any unbiased estimator T (X) of 0. X is a complete sufficient statistic, and 2X is 
unbiased for 6 so that T (X) = 2X is the UMVUE. Also, 


2 ө? 


varg(2X) = 4 vat X = ub - x 
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Thus the lower bound of 02/4 of the CRK inequality is not achieved by any unbiased 
estimator of Ө. 


Example 5. Let X have PMF 


1 


Py{X =k} = {N 
0, otherwise. 


я k=1,2,...,N, 


Let O = (N: N > M, M > 1 given). Take у (N) = N. Although the FCR regular- 
ity conditions do not hold, (11) is applicable since for N # N' € OC Я, 


S(N) = (L2,..., N} э S(N) = (L2,..., №} if N' < М. 


Also, Py and Py: are different for N # №. Thus 


м – №)? 
vary (T) > sup PE 
м<м VatN {Py'/ Рм} 











Now 
Ре gy = PWG) |. 12.6.0, М <N, 
Ру Ру) 0, otherwise, 
p, | PM) Pu NV ON 
N PNO] мм) Nn" 
and 
Py(X)] N 
«шсш ^ 
[e| NI -0 forN >N 


It follows that 


(N – №)? ; ; 
vary(T(X)) > sup ———————— = sup N'(N — №). 
N МЕМ и ( ) 


Now 


КМ —k) А . N41 
-DN kF”! if and only if k < 2^ 





so that N'(N — N’) increases as long as № < (N + 1)/2 and decreases if № > 
(N + 1)/2. The maximum is achieved at № = [(N + D)/2] if M x (N + 1)/2 and 
at N’ = M if M > (М + 1)/2, where [x] is the largest integer < x. Therefore, 
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N+1 N+1 
VeRO КЕ же a eer 
2. 2 2. 
апд 
уагү(Т(Х)) > M(N — M), if M > e |. 


Example 6. Let X ~ N (0, 07). Let us compute J (see Remark 9) for 5 #0. 
2 2n 2 2 
1 X: ‹ 
pod. ЕУ Z Z exp РР De —1 
8? fo (X) 8? | (o +8)?" (о +8)? а? 


lll оу р | XL +208)] , 
(5 (53) Р 2200 rae | |” 





and 





1 gx yx 1 
FoI s (255) в. ow (: s jog 


where с = (6? + 208)/(o + 8)2. 
Since Y^ X2/o? ~ x?(n), 


E,J f ( 2 * d 1 for l 
=- —— TRE SECT tas C < с. 
С 52 | Vo +8 (1 — 2с)п/2 2 


Let k = 8/0; then 


QUEE gan pp ыйы" 
~ (L+k? ~ 049" 


and 
1 РР " 
EgJ = Bü TAY" у in 


Here 1 -+k > 0 and 1 — 2c > 0, so that 1 — 2k — К? > 0, implying that —/2 < 
k + 1 < 4/2 and also that k > —1. Thus —1 < k < 2 — 1 and k Æ 0. Also, 
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1 Ch — 2k — k?" — 1 
&—0 k>0 К?о? 


2п 


о? 


by L’Hospital’s rule. We leave the reader to check that this is the FCR lower bound 
for var, (T (X)). But the minimum value of E, J is not achieved in the neighborhood 
of k — 0, so that the CRK inequality is sharper than the FCR inequality. Next, we 
show that for n — 2 we can do better with the CRK inequality. We have 


1 1 
Es | ај 
on ko? la — 2k — k?)(1 +k)? | 


(к + 2)? 
= — 2-1, k#0. 
o? 4) — 2k — k?) I«k« 2-1, k# 


For k = —0.1607 we achieve the lower bound as (E,J)~! = 0.269807, so that 
var; (T(X)) > 0.269802 > о? /4. Finally, we show that this bound is by no means 
the best available; it is possible to improve on the Chapman-Robbins-Kiefer bounds, 
too, in some cases. Take 


TFe/2 o [EFX 
Го + 0/2] 2} о? 


to be an estimate of c. Now Eo Т = o and 


к.т? = Zl P(n/2) jq) 
2 Ги + 1/2] в? 


T(X|i, X2, LE 19 Xn) = 





md ren Y 


2 [Ги + 1/2] 
so that 
оне) ys 
и | (тыт) | 
Forn = 2, 


4 
var, (T) = о? ( = ) = 0.27320?, 
д 


which is > 0.269802, ће CRK bound. Note that T is the UMVUE. 


Remark 11. In general the CRK inequality is as sharp as the FCR inequality. See 
Chapman and Robbins [11, pp. 584—585], for details. 
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We next introduce the concept of efficiency. 


Definition 2. Let Т, 7? be two unbiased estimators for a parameter 0. Suppose 
that Eg T2 « oo, Eg Т? < оо. We define the efficiency of Тү relative to Т» by 


varo (72) 
14 T puer 
(14) effo(Ti | T2) ча) 
and say that Т} is more efficient than Т» if 
(15) effg(T; | Т) > 1. 


It is usual to consider the performance of an unbiased estimator by comparing its 
variance with the lower bound given by the FCR inequality. 


Definition 3. Assume that the regularity conditions of the FCR inequality are sat- 
isfied by the family of DFs (Fo, 0 € ©}, Ө С R. We say that an unbiased estimator 
T for parameter Ө is most efficient for the family {Ро} if 


-1 
д log fo(X) 1? 
(16) «n = [s [ао = 1,0). 


Definition 4. Let T be the most efficient estimator for the regular family of DFs 
(F5,0 є Ө}. Then the efficiency of any unbiased estimator Ту of 0 is defined as 


var(T) In) 
var(Ti) — vare(Ti) 








(17) effo (T1) = effe (Tı | T) = 


Clearly, the efficiency of the most efficient estimator is 1, and the efficiency of 
any unbiased estimator Т is < 1. 


Definition 5. We say that an estimator 7| is asymptotically (most) efficient if 
(18) lim effg(T,) = 1 
noo 


and T; is at least asymptotically unbiased in the sense that іт, оо Eo Тү = Ө. Here 
n is the sample size. 


Remark 12. Definition 3, although in common use, has many drawbacks. We 
have already seen cases in which the regularity conditions are not satisfied and yet 
UMVUEs exist. The definition does not cover such cases. Moreover, in many cases 
where the regularity conditions are satisfied and UMVUES exist, the UMVUE is not 
most efficient since the variance of the best estimator (the UMVUE) does not achieve 
the lower bound of the FCR inequality. 
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Example 7. Let X ~ b(n, p). Then we have seen in Example 1 that X/n is 
the UMVUE since its variance achieves the lower bound of the FCR inequality. It 
follows that Х/и is most efficient. 


Example 8. Let X1, X2,... , X, be iid P(A) RVs and suppose that (A) = 
Р(Х = 0) —:e^^. From Example 2, the UMVUE of y. is given by То = [(n — 
D/n]2i Xi with 


vat, (To) = e^? e^" — 1). 
Also, 1,(0) = (Ae~2*)/n. It follows that 


(Xe7?/n he~*) /n 


eff, (To) = een L1) ^ еу) 


since e* — 1 > x for x > 0. Thus То is not most efficient. However, since eff; (To) — 
1 as n — oo, To is asymptotically efficient. 


In view of Remarks 6 and 7, the following result describes the relationship be- 
tween most efficient unbiased estimators and UMVUEs. 


Theorem 3. A necessary and sufficient condition for an unbiased estimator T of 
у to be most efficient is that T be sufficient and the relation (8) holds for some 
function k(0). 


Clearly, an estimator T satisfying the conditions of Theorem 3 will be the 
UMVUE, and two estimators coincide. We emphasize that we have assumed the 


regularity conditions of FCR inequality in making this statement. 


Example 9. Let (X, Y) be jointly distributed with PDF 
x 
fox, у) = exp|- (5 +9»), х> 0, у> 0. 


For a sample (х, у) of size 1, we have 


a д үх х 
—— | E = —{ — ө = А 
gg e 700.) A +60) = — ty 


Hence, information for this sample is 








E(X?) 2E(XY) 
64 ө? C 


X\2 
1(0) = Eo(Y = 31) = Eg(Y?) + 


Now 


2 
Eo(¥?) = =, Eọ(X?) = 202, and E(XY)=1, 
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so that 


2 2 2 2 
Юве ы 


Therefore, the Fisher information in a sample of n pairs is 2n/0?. 
We return to Example 8.3.23, where X1, X5, ... , X, are iid G(1, 0) and Y1, Y2, 
, Yn are iid G(1, 1/0), and X's and Y's are independent. Then (X1, Y) has com- 
mon PDF /fe(x, y) given above. We will compute Fisher's Information for 0 in the 
family of PDFs of S(X, Y) = (Y: X;/ У Ү;)!/?. Using the PDFs of Y^ X; ~ G(n, 6) 
and УУ; ~ G(n,1/0) and the transformation technique, it is easy to see that 
S(X, Y) has PDF 


2T Qn) _ gy 
ge(s) — Г? (s + =) > 5-0. 


8 log go(s) s 1\fs. өү! 
a6 = -2n ( Gt). 


8 4n? $ 
вов = wi Р [ = (5 + i in 


-fi al = 2 2n 
8 b. 4-10) 


2n 
"Y 


It follows that 


Dm 





« 


That is, the information about 0 in S is smaller than that in the sample. 
The Fisher nformation in the conditional PDF of S given А == a, where 
A(X, Y) = S;(X)S2(Y) can be shown (Problem 12) to equal 


2a K\(2a) 
0? Ko(2a)’ 





where Ко and Kj are Bessel functions of order 0 and 1, respectively. Averaging over 
all values of A, опе can show that the information is 27 / 0?. which is the total Fisher 
information in the sample of n pairs (xj, y;)’s. 
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PROBLEMS 8.5 


1. 


10. 
п. 


Are the following families of distributions regular in the sense of Fréchet, 
Cramér, and Rao? If so, find the lower bound for the variance of an unbiased 
estimator based on a sample size n. 


(a) fo(x) =07'e-*/® if x > 0, and = 0 otherwise; Ө > 0. 
(b) fo(x) = e-€-9 if @ < x < oo, and = 0 otherwise. 

(с) fo(x) = 00 — 0), x 20,1,2,...,0«0 <1. 

(d) f(x; с?) = A/o vme, —оо < x < оо;о? > 0. 


. Find the CRK lower bound for the variance of an unbiased estimator of Ө, based 


on a sample of size n from the PDF of Problem 1(b). 


. Find the CRK bound for the variance of an unbiased estimator of 0 in sampling 


from N (0, 1). 


. In Problem 1 check to see whether there exists a most efficient estimator in each 


case. 


. Let X1, X2, ... , X, be a sample from a three-point distribution: 


1-0 1 8 
PU = yi} = = Р(Х =y} = s. and PIX = уз} = у, 


where 0 < 0 < 1. Does the FCR inequality apply in this case? If so, what is the 
lower bound for the variance of an unbiased estimator of 0? 


. Let Xi, X2, ... , Xn be iid RVs with mean и and finite variance. What is the effi- 


ciency of the unbiased (and consistent) estimator [2/n(n + 1)] $a i X; relative 
to X? 


. When does the equality hold in the CRK inequality? 


. Let Xi, X2, ... , Xn be a sample from N (и, 1), and let (и) = и?. 
` (a) Show that the minimum variance of any estimator of u? from the FCR in- 


equality is 44? /n. 


(b) Show that 7(X,, X2,... , Xn) = x: — 1/n is the UMVUE of и? with 
variance (44? /n + 2/n?). 


. Let X1, X2, ... , Xn be iid G(1, 1/о) RVs. 


(a) Show that the estimator T (X1, X2,... , Xn) = (n — 1)/nX is the UMVUE 
for œ with variance a? /(n — 2). 


(b) Show that the minimum variance from FCR inequality is a2/n. 
In Problem 8.4.16, compute the relative efficiency of fig with respect to 21. 


Let X1, X2, ... , X, and Y, Y2,... , Ym be independent samples from N (u, оў) 
and N (p, оў), respectively, where и, оў, оў are unknown. Let p = o2 Jod and 
0 = m/n, and consider the problem of unbiased estimation of и. 
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(a) If о is known, show that 
йо =aX +(1—a)¥, 


where о = p/(p + 0) is the BLUE of u. Compute var(jig). 
(b) If p is unknown, the unbiased estimator 
+0Y 
1+0 


мі 





й = 


is optimum in the neighborhood of р = 1. Find the variance of jz. 
(c) Compute the efficiency of jz relative to ĝo. 
(d) Another unbiased estimator of yz is 


| pFX+0Y 
© @+90F 


A 


, 


where F = 52/052 is an F(m — V, n — 1) RV. 


12. Show that the Fisher information on Ө based on the PDF 


жыш е |-«(5 +5) | 
коба) P 7X8: s 


for fixed a equals (2а/02)1К\(2а)/ Ko a)], where Ko(2a) and K,(2a) are 
Bessel functions of order 0 and 1, respectively. 


8.6 SUBSTITUTION PRINCIPLE (METHOD OF MOMENTS) 


One of the simplest and oldest methods of estimation is the substitution principle: 
Let y (0), Ө є © be a parametric function to be estimated on the basis of a random 
sample X1, X2, ... , X, from a population DF F. Suppose that we can write y (0) = 
h(F) for some known function h. Then the substitution principle estimator of ү (Ө) 
is h(F*), where Fy is the sample distribution function. Accordingly, we estimate 
u = u(F) by u(F*) = = X, m; = Er X* by 2 , X;/n, and so on. The method of 
moments is a special case when we need to estimate some known function of a finite 
number of unknown moments. Let us suppose that we are interested in estimating 


(1) Ө = h(mi,m»,..., mi), 


where h is some known numerical function and m; is the jth-order moment of the 
population distribution that is known to exist for 1 < j < К. 
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Definition 1. The method of moments consists in estimating Ө by the statistic 


n n n 
Q T(X,... SO Cap Xen Yt x). 
1 1 1 


To make sure that Т is a statistic, we will assume that h : Rg — R is a Borel- 
measurable function. 


Remark 1. ЇЇ is easy to extend the method to the estimation of joint moments. 
Thus we use n^! Di Х{Ү; to estimate E(XY), and so on. 


Remark2. From the WLLN, n7! У" XÍ  EX/. Thus, if one is interested 
in estimating the population moments, the method of moments leads to consistent 
and unbiased estimators. Moreover, the method of moments estimators in this case 
are asymptotically normally distributed (see Section 7.5). 

Again, if one estimates parameters of the type 0 defined in (1) and Л is a contin- 
uous function, the estimators Z (X1, X2, ... , Xn) defined in (2) are consistent for Ө 
(see Problem 1). Under some mild conditions on h, the estimator T is also asymp- 
totically normal (see Cramér [16, pp. 386-387]). 


Example І. Let X1, X2, ... , Xn be iid RVs with common mean и and variance 
с?. Then o = ,/m2 — mi, and the method of moments estimator for o is given by 





Although T is consistent and asymptotically normal for o, it is not unbiased. 

In particular, if X1, X2, ... , X, are iid P(A) RVs, we know that EX, = А and 
var(X1) = А. The method of moments leads to using either X or Yi — X)?/ n 
as an estimator of A. To avoid this kind of ambiguity we take the estimator involving 
the lowest-order sample moment. 


Example 2. Let Xi, X2,... , X, be a sample from 


1 








———, <х<Ь, 
Рох) = {b-a TERS 
0, otherwise. 
Then 
b b — a} 
кж СЕ mi ушуш EE 


12 
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The method of moments leads to estimating EX by X and var(X) by УМ(Х; – 
X)? /n, so that the estimators for a and b, respectively, are 


n .-Yy2 

р Xn) =- 32100-07 
poss n ‚—Х)? 
T(X1,...,Xn)=¥+ ыи Xe 


Example 3. Let X1, X2,..., Ху be iid b(n, р) RVs, where both n and p are 
unknown. The method of moments estimators of p and n are given by 


and | 


Х = ЕХ =пр 


апі 
1 N 
N Ух = ЕХ? =пр(1—р) + п2р?. 
1 


Solving for n and p, we get the estimator for p as 


X 
T(X1,...,XnN) = —————————. 
ЕТ gies Ж 
where 72(Xj1,..., X N) is the estimator for п, given by 
(x? 


TOt, Xo,..., Xu) = — ML 
x + — (EY хум) 


Note that X E. np, У X?/N L np(1 — р) + n? p?, so that both Тү and T» are 
consistent estimators. 

Method of moments may lead to absurd estimators. The reader is asked to com- 
pute estimators of Ө in A/(0, Ө) ог N (0, 6?) by the method of moments and verify 
this assertion. 


PROBLEMS 8.6 


1. Let X, EA a, and Y, 5 b, where a and b are constants. Let h : Ra — R bea 
continuous function. Show that h(X,, Yn) d h(a, b). 
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2. Let X1, X2,..., X, be a sample from G(a, f). Find the method of moments 
estimator for (o, B). 


3. Let X1, X2,... , X, bea sample from N (p, o?). Find the method of moments 
estimator for (ш, 07). 


4. Let X1, X2,... , X, be a sample from В(о, f). Find the method of moments 
estimator for (о, В). 


5. A random sample of size n is taken from the lognormal PDF 
m Me 1 
f(x; ш, о) = (ov 20) 1х ! exp [- gators -m| 4 х > 0. 


Find the method of moments estimators for и and o?. 
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In this section we study a frequently used method of estimation, namely, the method 
of maximum likelihood estimation. Consider the following example. 


Example I. Let X ~ b(n, p). One observation on X is available, and it is known 
that n is either 2 or 3 and p = 1 or 1. Our objective is to estimate the pair (л, р). 
The following table gives the probability that X = x for each possible pair (n, p): 





Maximum 
QD QD Gi) G$) Probability 





& 
27 
12 

d 
£ 


2 


27 
EN 
27 


WN о[и 

© al nie oie 

Ф er Ore о 
aoj COI BOIL Оор 
im оош t9i— WIA 


The last column gives the maximum probability in each row, that is, for each value 
that X assumes. If the value x = 1, say, is observed, it is more probable that it came 
from the distribution b(2, 1) than from any of the other distributions, and so on. The 
following estimator is therefore reasonable in that it maximizes the probability of the 
value observed: 


(2,1) ifx=0, 
(2,5) ifx =1, 
(3,12) ifx =2, 
G.l ifx =3. 


(f, p)(x) = 
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The principle of maximum likelihood essentially assumes that the sample is rep- 
resentative of the population and chooses as the estimator that value of the parameter 
which maximizes the PDF (PMF) fo (x). 


Definition 1. Let (X1, X2,... , Xn) be a random vector with PDF (PMF) fo 
(x1, x2, ... , Xn), Ө € ©. The function 


(1) L(0; X1,%2,... Xn) = foxi, x2... ‚ Хп), 
considered as a function of Ө, is called the likelihood function. 


Usually, 9 will be a multiple parameter. If X1, X2,... , Xn are iid with PDF 
(РМР) fo (x), the likelihood function is 


Q) L(G; х1, x2, ... уха) = [ [ fox). 


i=1 
Let O C Ry and X = (X1, X2,... , Xn). 


Definition 2. The principle of maximum likelihood estimation consists of choos- 
ing as an estimator of 0 a 0(X) that maximizes L(0; x1, xo, ... , Xn), that is, to find 
a mapping Ө of Ra — Rx that satisfies 


Q) L(0; xi, x2, ... , Xn) = sup L(6; xi, хә,..., Xn). 
0c08 


(Constants are not admissible as estimators.) If a 0 satisfying (3) exists, we call it a 
maximum likelihood estimator (MLE). 


It is convenient to work with the logarithm of the likelihood function. Since log is 
a monotone function, 


(4) log L(6; Xj... , Xn) = sup log L(@; x1,..., Xn). 
0c8 


Let © be an open subset of Rz, and suppose that /о(х) is a positive, differentiable 
function of Ө (that is, the first-order partial derivatives exist in the components of 0). 
If a supremum Ө exists. it must satisfy the likelihood equations 


д log L(@; xi, ... , 
(5) эв: л.) 0, }=1,2,....&, @0=(@,..., 0). 
j 


Any nontrivial root of the likelihood equations (5) is called an MLE in the loose 
sense. A parameter value that provides the absolute maximum of the likelihood func- 
tion is called an MLE in the strict sense or, simply, an MLE. 
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Remark I. If © C R, there may still be many problems. Often, the likelihood 
equation д L/3 0 = 0 has more than one root, or the likelihood function is not dif- 
ferentiable everywhere in ©, or 6 may be a terminal value. Sometimes the likelihood 
equation may be quite complicated and difficult to solve explicitly. In that case one 
may have to resort to some numerical procedure to obtain the estimator. Similar re- 
marks apply to the multiparameter case. 


Example 2. Let X1, X2,... , Xn be a sample from Л (џи, c?), where both u and 
o? are unknown. Here Ө = ((u,02?), —oo < u < оо, o? > 0}. The likelihood 
function is 


n 


1 (x; — uy 
2. ы z EN duis MEN ac 
L(p,o^; xi... , Xn) = o^ x yi e| д. 572 | А 


апа 


п 


Yo — и)? 
2 2 


logo? — 5 log) -_s ses. 


log (и, о?; х) = 
20 


The likelihood equations аге 


n 


1 
5 ) (Gi — р) = 0 
a i=l 


and 


n1 1 & 

Ee: * zi — uy. = 0. 
Solving the first of these equations for џи, we get й = X and, substituting in the 
second, 62 = Y? [(X; — X)?/n). We see that (2,62) є Ө with probability 1. We 
show that (2, 5 2) maximizes the likelihood function. First note that X maximizes 
L(u, а?; x) whatever o? is, since (и, 0; x) > 0 as |u| — oo, and in that case 
L(ji, 02; x) > 0 as o? — 0 or оо whenever 0 € ©, 0 = (fi, 62). 

Note that 52 is not unbiased for o?. Indeed, EG? = [(n — 1)/n]o?. But nó?/(n ~ 
1) = 52 is unbiased, as we already know. Also, ji is unbiased, and both fi and ó? are 
consistent. In addition, (2 and 6? are method of moments estimators for u and o?, 
and (ji, 67) is jointly sufficient. 

Finally, note that 2 is the MLE of n if c? is known; but if H is known, the MLE 
of o? is not 6? but У(Х; – ш)? n. 


Example 3. Let X1, X2,... , X, be a sample from PMF 


1 


—, k—1,2,...,'N, 
Ру(Ю = ү N 
0 


otherwise. 
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The likelihood function is 


L(N; ky, kz kn) = VN? 1 < max(kj,...,kn) < М, 
, , g... о Гр = 


0, otherwise. 
Clearly, the MLE of N is given by 
Nf, X2, ... , Xn) = max(Xi, X2,..., Xn), 
for if we take any à « N as the MLE, then Pa (ki, k2, ... , kn) = 0; and if we take 
Axes N as the MLE, then Pa(ki, ko, ... kQ) = 1/(8)" < 1/(М)" = (Кї, k2, 
«єк а). 


We see that the MLE N is consistent, sufficient, and complete, but not unbiased. 


Example 4. Consider the hypergeometric PMF 
G N—M 
х/\п—х 
Pu(x) = N | 
п 


0, otherwise. 


max(0, n — N + M) x x x min(n, M), 


To find the MLE Ñ = N(X) of N, consider the ratio 


ому PM, Мп М-м 
Рм) N N—-M-n«x 


For values of N for which R(N) > 1, Py (x) increases with №, and for values of 
N for which R(N) < 1, Ру(х) is a decreasing function of N: 


M 
КМ) »1  ifandonlyif N < 2— 
x 


and 


M 
R(N) «1 ifandonly if N > = 


It follows that Py (x) reaches its maximum value where № ~ п М/х. Thus N (Х) = 
[п M/ X], where [x] denotes the largest integer < x. 


Example 5. Let X1, X2, ... , Xn bea sample from [0 — 5,0 + 4]. The likeli- 
hood function is 
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1 10-4 x min(i,... , xn) 
L(0; x, x2, ... , Xn) = < max(x1,...,%) S0 +4, 
0 otherwise. 


Thus L(0; x) attains its maximum provided that 

0 — i < min(xj,...,x,) and 0+ " > тах(хі,... , Xn), 
or when 

Ө < min(xj,...,x5)-- and Ө > max(a,... , Xn) – $. 
It follows that every statistic T (X1, X2,... , Xn) such that 


(6) max X; - < T(X1, X2,.-.,Xn) < min X; +4 
l<i<n 


1<і<п 


is an MLE of @. Indeed, for 0 < о < 1, 


T«(X1,..., Xn) = max X; — і +0(1 + min X; — max Xi) 
l<i<n 1 €i «n 1<ї<п 
lies in interval (6), and hence for each a, 0 < = < 1, Т„(Х\,...,Х„) is an MLE 


of Ө. In particular, if o = 4, 


min X; + max X; 


T1/2(%1,.-., Xn) = 2 


is an MLE of 6. 


Example 6. Let X ~ Ь(1, р), р є 2, 2]. In this case L(p; х) = p*(1 — p, 
x == 0, 1, and we cannot differentiate L(p; x) to get the МГЕ of p, since that would 
lead to p = x, a value that does not lie in Ө — [i 3]. We have 


р, x=, 
L H = 
(р; х) |, — 
which is maximized if we choose p(x) — 1 if x = 0, and = i if x — 1. Thus the 
MLE of p is given by 


2X +1 


P(X) = 1 





Note that E; p(X) = (2p + 1)/4, so that р is biased. Also, the mean square error for 
pis 


Ep(p(X) — р)? = & E,QX +1 – 4р) = ү. 
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In the sense of the MSE, the MLE is worse than the trivial estimator 6(X) = 1, for 
E, — р)? 2G - р)? < jgforpeli il 


Example 7. Let X1, X25, ... , Xn be iid b, p) RVs, and suppose that p є (0, 1). 
If (0,0,...,0)((1, 1,... , D) is observed, X = 0(Х = 1) is the MLE, which is not 
an admissible value of p. Hence an MLE does not exist. 


Example 8 (Oliver [76]). This example i:lustrates a distribution for which an 
MLE is necessarily an actual observation, but not necessarily any particular observa- 
tion. Let X1, X2,... , X, bea sample from the PDF 


2 
=. 0x x «6, 
a0 

fo(x) = аса 0 < x < a, 
аа – 6 
0, otherwise, 


where a > 0 is a (known) constant. The likelihood function is 


2\" Xj a— xi 
L(0; , s...’ = ES NE , 
(0; x1, х2 Xn) (=) Hie 


x; x8 x> 





where we have assumed that observations are arranged in increasing order of mag- 
nitude, O < x < x2 < --- < x, < a. Clearly, L is continuous in Ө (even for 
0 = some x;) and differentiable for values of 6 between any two x;'s. Thus, for 
xj «0 < xj44, we have 

J 


235. ^ ; z 
mon ју Fen] | ; | | y 
L(0) — (2) 0 (а — Ө) EA (a — xi), 








i= i=j+] 
91 j -j д? log L ] =j 
SRL Ro n BED /, п-] o. 
a0 0 а-0 902 02 (а —6)2 


It follows that any stationary value that exists must be a minimum, so that there can 
be no maximum in any range x; < 0 < xj+1. Moreover, there can be no maximum 
inO <6 < xy orx, < 0 <a. This follows since for O < 0 < x1, 


2 n ке п 
L@)= (=) (а — 0) [[e- 2 


is a strictly increasing function of 0. By symmetry, L(0) is a strictly decreasing 
function of Ө in x, < 0 < о. We conclude that an MLE has to be one of the 
observations. 

Tn particular, let o, = 5 and n = 3, and suppose that the observations, arranged in 
increasing order of magnitude, are 1, 2, 4. In this case the MLE can be shown to be 
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ё = 1, which corresponds to the first-order statistic. If the sample values are 2, 3, 4, 
the third-order statistic is the MLE. 


Example 9. Let Xj, X2,..., X, be a sample from G(r, 1/8); B > Oandr > 0 
are both unknown. The likelihood function is 


B gx 1 
L(B ri X1,X2,.+- Xn) = ror П. 1 exp (- BY. xiu). Xi > 0, 
0, otherwise. 


Then 


log L(B, ғ) = nr log В — nlogT'(r) + (r — 1) У logxi -pY xi, 
i=l i= 


9 log L(B, z 
Porro ауар 








dE Omm 
and 
д1 › 
LOREN eo 0. 
дг 
The first of the likelihood equations yields Boi, X2,... , Хп) = F/X, while the sec- 
ond gives 
r = I(r) 
n log = + 2 lox —n ro) = 
that is, 


Mir) 
1 ы шыс x 
ogr Fo logx OgXi, 


which is to be solved for F. In this case, the likelihood equation is not easily solvable 
and it is necessary to resort to numerical methods, using tables for l'(r)/ T(r). 


Remark 2. We have seen that MLEs may not be unique, although frequently they 
are. Also, they are not necessarily unbiased even if a unique MLE exists. In terms of 
MSE, an МГЕ may be worthless. Moreover, MLEs may not even exist. We have also 
seen that MLEs are functions of sufficient statistics. This is a general result, which 
we now prove. 


Theorem 1. Let Т be a sufficient statistic for the family of PDFs (PMFs) { fo : 
Ө € Ө}. If a unique MLE of Ө exists, it is a (nonconstant) function of T. If a MLE of 
0 exists but is not unique, one can find a MLE that is a function of Т. 
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Proof. Since T is sufficient, we can write 
L(0) = fo(x) = hG)ge(T (x)), 
for all x, all Ө, and some h and gg. If a unique MLE 6 exists that maximizes L(0), 
it also maximizes ge(T (x)) and hence @ is a function of T. If a MLE of 0 exists but 


is not unique, we choose a particular MLE Ө from the set of all MLEs which is a 
function of Т. 


Example 10. Let X1, X2,... , Xn be a random sample from U[0,0 -- 1], Ө є А. 
Then the likelihood function is given by 


1 п 

L(6:x) = (4) Дө-л<хпу<ху<ө+1109. 

We note that T (X) = (Хут), X) is jointly sufficient for Ө and any Ө satisfying 
0-1< хр Xm <=0 +1, 
or, equivalently, 
Xm) - 150 < ха) +1 

maximizes the likelihood and hence is an MLE for 0. Thus, for 0 < о < 1, 

0, = o(Xq) — D (1 —@)(Xay + 1) 
is an MLE of 0. If a is a constant independent of the X's, then б, is a function of T. 
Tf, on the other hand, œ depends on the X's, then 6, may not be a function of T alone. 
For example, 

Bq = (sin? X1) (XQ) — 1) + (cos? Xi) (X) + D 
is an MLE of 0 but not a function of T alone. 
Theorem 2. Suppose that the regularity conditions of the FCR inequality are sat- 

isfied and 0 belongs to an open interval on the real line. If an estimator 0 of 0 attains 
the FCR lower bound for the variance, the likelihood equation has a unique solution 


6 that maximizes the likelihood. 


Proof. 1 Ó attains the FCR lower bound, we have [see (8.5.8)] 


д log fo (X) 


НУ -ir 
29 = [&(0)] [0(X) — 6] 


with probability 1, and the likelihood equation has a unique solution 0 — б. 
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Let us write А(0) = [k(0)]^!. Then 


Ә? log fo(X) 4, 2 
Брета А (8)(0 — 0) — A(0), 
so that 
д? log fo X) 
cA 95 ky = —A(0). 


We need only to show that А(Ө) > 0. 
Recall from (8.5.4) with ү (Ө) = 0 that 


Bo firo ~ ө] 000), 


and substituting T (X) — Ө = k(0)[8 log fo(X)/90], we get 


2 
k(6) Eo E x 


That is, 


2 
АӨ) = E Ed -0 
00 


and the proof is complete. 


Remark 3. Їп Theorem 2 we assumed the differentiability of A(@) and the exis- 
tence of the second-order partial derivative 3? log /» /д 07. If the conditions of The- 
orem 2 are satisfied, the most efficient estimator is necessarily the MLE. It does not 
follow, however, that every MLE is most efficient. For example, in sampling from 
a normal population, 62 = У(Х; — X)?/n is ће MLE of o°, but it is not most 
efficient. Since У(Х; — X)?/o? is x? (n — 1), we see that var(62) = 2(n — 1)o*/n?, 
which is not equal to the FCR lower bound, 204/n. Note that 62 is not even an 
unbiased estimator of o?. 


We next consider an important property of MLEs that is not shared by other meth- 
ods of estimation. Often the parameter of interest is not Ө but some function h(0). If 
6 is the MLE of 0, what is the MLE of h(@)? If à = h(0) is a one-to-one function of 
6, the inverse function A^! (A) = @ is well defined and we can write the likelihood 
function as a function of A. We have 


L*Q;x) = L(h^1 0); x) 
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so that 


sup L*(A; x) = sup L(h- (4); x) = sup L(0; x). 
n à Ө 


It follows that the supremum of L* is achieved at A = h(6). Thus (6) is the MLE 
of h(0). 

In many applications А = А(0) is not one-to-one. It is still tempting to take А = 
h(8) as the MLE of A. The following result provides a justification. 


Theorem 3 (Zehna [121]). Let ( fo: Ө € ©} be a family of PDFs (PMFs), and let 
L(@) be the likelihood function. Suppose that © C Ry, k > 1. Leth: Ө — A bea 
mapping of © onto A, where A is an interval in Rp (1 < p < k). If is an MLE of 


0, then һо) i is an MLE of A(0). 
Proof. Foreach à € A, let us define 
©, —(0:0€0,h(00 =A} 
and 


М(А; х) = sup L(@; x). 
Hc, 


Then M defined on A is called the likelihood function induced by A. If 6 is any MLE 


^ 


of Ө, then Ө belongs to one and only one set, Ө; say. Since Ө € Ө;, А = h(0). Now 


М(Х; х) = sup L(0; x) > LÊ; x) 
OE) 


and A maximizes M, since 


МА; х) < sup MO; х) = sup L(@; x) = L(Ó; x), 
ӨєӨз 


so that M(A; x) = Sup; cA M(A; x). It follows that А is an MLE of A(0), where 
А = h(6). 


Example 11. Let X ~ (1, р), 0 < p < 1, and let T) = var(X) = p(l — p). 
We wish to find the MLE of h(p). Note that A = [0, 1]. The function А is пої one- 


to-one. The MLE of p based on a sample of size п is D(Xi,..., Xn) = X. Hence 
the MLE of parameter h(p) is h(X) = X(1 — X). 


Example 12. Consider a random sample from G(1, В). It is required to find the 
MLE of f in the following manner. A sample of size л is taken, and it is known 
only that k, 0 < k < n, of these observations are < M, where M is a fixed positive 
number. 
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Let p = P(X; < M} = 1 — e~™/8, so that -M/B = log(1 — p) and В = 


M/log[1/(1 — p)]. Therefore, the MLE of В is M/log[1/(1 — p)], where р is the 
MLE of p. To compute the MLE of p we have 


L(p; х1, х2, ec Xn) = pk AT pr *, 
so that the MLE of p is p — k/n. Thus the MLE of f is 


M 


ЕЕ ss 
log[n/(n — k)] 

Finally, we consider some important large-sample properties of MLEs. In the fol- 
lowing we assume that { fo, 0 € ©} is a family of PDFs (PMFs), where Ө is an open 
interval on R. The conditions listed below are stated when јо is a PDF. Modifications 
for the case where fg is a PMF are obvious and will be left to the reader. 


(i) 8 log fo/8 0, 3? log f5/8 0?, 3? log fo/d 6? exist for all Ө є Ө and every x. 





Also, 
© a faa) ð log fX) _ 
I. 30 dx = Ee 30 = 0 for all 0 € Ө. 
2 
Gye 99) Reo mande. 





о 3902 
оо 97 log f(x) 


Gii) —со</ 391 fo(x)dx <0 for all Ө. 
(iv) There exists a function H (x) such that for all 0 є Ө, 
831 Фо 
See) < H(x) and f Н(х) fo(x) dx = M(0) < оо. 
—oo 








(v) There exists a function g(0) that is positive and twice differentiable for every 
0 € Ө and, a function Н (x) such that for all Ө 


ә? 
302 








10) д log d 


00 





< H(x) and f. H (x) fo(x) dx « oo. 


Note that the condition (v) is equivalent to condition (iv) with the added qualifi- 
cation that g(0) = 1. 
We state the following results without proof. 


Theorem 4 (Cramér [16]) 


(a) Conditions (1), (iii), and (iv) imply that with probability approaching 1, as 
n — oo, the likelihood equation has a consistent solution. 
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(b) Conditions (i) through (iv) imply that a consistent solution б, of the likelihood 
equation is asymptotically normal, that is, 


о, e) 5Z 


where Z is Л/(0, 1), and 


3logfoQO T] - 
ЕСЕ ЧЕ 


On occasions one encounters examples where the conditions of Theorem 4 are not 
satisfied and yet a solution of the likelihood equation is consistent and asymptotically 
normal. 


Example 13 (Kulldorf [55]). Let X ~ N(O, 0), 0 > 0. Let X1, X2,..., Xn be 
n independent observations on X. The solution of the likelihood equation is 0, = 
7.1 X2/n. Also, ЕХ? = Ө, var(X?) = 202, and 


2 
E; |: Er ME 


дө 20 
We note that 
б, > Ө 
апа 
Мп (6, — 0) = 0/2 = iL —_ ^ (о, 207). 
М2п Ө 
However, 


951 1 3x? 
A Longo as@ — 0 





and is not bounded in 0 < 0 < oo. Thus condition (iv) does not hold. 
The following theorem covers such cases also. 
Theorem 5 (Kulldorf [55]) 
(a) Conditions (1), (iii), and (v) imply that with probability approaching 1 as t — 
co, the likelihood equation has a solution. 


(b) Conditions (i), (ii), (iii), and (v) imply that a consistent solution of the likeli- 
hood equation is asymptotically normal. 
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For proofs of Theorems 4 and 5 we refer to Cramér [16, p. 500], and Kulldorf [55]. 


Remark 4. It is important to note that the results in Theorems 4 and 5 establish 
the consistency of some root of the likelihood equation but not necessarily that of 
the MLE when the likelihood equation has several roots. Huzurbazar [44] has shown 
that under certain conditions the likelihood equation has at most one consistent so- 
lution and that the likelihood function has a relative maximum for such a solution. 
Since there may be several solutions for which the likelihood function has relative 
maxima, Cramér’s and Huzurbazar's results stil] do not imply that a solution of the 
likelihood equation that makes the likelihood function an absolute maximum is nec- 
essarily consistent. 

Wald [114] has shown that under certain conditions the MLE is strongly consis- 
tent. It is important to note that Wald does not make any differentiability assump- 
tions. 

In any event, if the MLE is a unique solution of the likelihood equation, we can 
use Theorems 4 and 5 to conclude that it is consistent and asymptotically normal. 
Note that the asymptotic variance is the same as the lower bound of the FCR in- 
equality. 


Example 14. Consider X1, X2, ... , X, tid P(A) RVs, А € Ө = (0, оо). The 
likelihood equation has a unique solution, Aix, ...3Xn) = X, which maximizes 
the likelihood function. We leave the reader to check that the conditions of Theo- 
rem 4 hold and that MLE X is consistent and asymptotically normal with mean A 
and variance A /n, a result that is immediate otherwise. 


We leave the reader to check that in Example 13, conditions of Theorem 5 are 
satisfied. 


Remark 5. Тһе invariance and the large-sample properties of MLEs permit us to 
find MLEs of parametric functions and their limiting distributions. The delta method 
introduced in Section 7.5 (Theorem 1) comes in handy in these applications. Suppose 
that in Example 13 we wish to estimate (6) = 02. By invariance of MLEs, ће MLE 
of y (0) is V) where 6, = X 2/ni is the MLE of 0. Applying Theorem 7.5.1, 
we see that y (б„) is AN (0?, 805 /п). 

In Example 14, suppose that we wish to estimate (A) = Р(Х = 0) = Jo 
Then y (5) = e~* is the MLE of y (A) and, in view of Theorem 7.5.1, y) ~ 
AN(e^^, Xe? Jn). 


Remark 6. The uniqueness of MLE does not guarantee its asymptotic normality. 
Consider, for example, a random sample from U (0, 0]. Then Xn) is the unique MLE 


for 0, and in Problem 8.2.5 we asked the reader to show that n(0 — Xn) E: G(1, 0). 
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PROBLEMS 8.7 


1. Let X1, X2, ... , Xn be iid RVs with common PMF (PDF) f(x). Find an MLE 
for Ө in each of the following cases: 


(a) fo(x) = 1e- 9-9, —oo < x < оо. 
(b) р(х) = e749, Ө «x < оо. 
(с) а(х) = (6a)x*-le-9**. x > 0, and о known. 
(d) fax) =0(1—x)9!,0<x<1,0>1. 
2. Find an MLE, if it exists, in each of the following cases: 


(a) X ~ b(n,0): both n and Ө є [0, 1] are unknown, and one observation is 
available. 


(b) ХІ, X2,...,Xn ~ b(1,0), Ө € [5, 3]. 
(c) X1, X2,...,Xn N (06,092), OER. 
(d) Хү, X2,... , Xn is a sample from 


1-8 


1 6 
Р{Х = у) = , PIX = у} = 5, PIX = уз) = 500 «6 < 1). 


2 
(е) X1, X2,...,Xn ~ МӨ, 0), 0 < Ө < оо. 
(f) X ~ C(O, 0). 


3. Suppose that n observations are taken on an RV X with distribution Л (и, 1), 
but instead of recording all the observations, one notes only whether or not the 
Observation is less than 0. If [X < 0] occurs m(« n) times, find the MLE of џи. 


4. Let X1, X2, ... , X, bearandom sample from the PDF 
Роз а, В) = Bole BG), a<x<0o, -оо<а<оо, p>0. 


(a) Find the MLE of (o, В). 
(b) Find the MLE of Px {Х| > 1}. 


5. Let X1, X2, ... , Xn be a sample from exponential density fo(x) = Qe”, х > 
0, 8 > 0. Find the MLE of 6, and show that it is consistent and asymptotically 
normal. 


6. For Problem 8.6.5 find the MLE for (u, 0”). 


7. For a sample of size 1 taken from N (4, c2), show that no MLE of (и, o?) 
exists. 


8. For Problem 5.2.5 suppose that we wish to estimate N on the basis of observa- 
tions X1, X2, ... , Хм. 
(a) Find the UMVUE of N. 
(b) Find the MLE of N. 
(c) Compare the MSEs of the UMVUE and the MLE. 
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9. Let Х(1 = 1, 2,...,5; j = 1,2,... , n) be independent RVs where X;; ~ 


10. 


11. 


12. 
13. 


14 


15. 


16. 


17 


NA (uj, a2), і = 1, 2,..., s. Find MLEs for ш, 42, ..., pes, and o7. Show that 
the MLE for о? is not consistent as s — oo (n fixed). (Neyman and Scott 


[75]) 


Let (X, Y) have a bivariate normal distribution with parameters 441, 42, oj, оў, 
and p. Suppose that n observations are made on the pair (X, Y), and N — n 
observations on X; that is, N — n observations on Y are missing. Find the MLEs 
of 41, 22, o2, оў, and р. [Hint: If f(x, y; Mi, U2, оў, оў, р) is the joint PDF 
of (X, Y), write 


fœ, у; Mi, M2, 07, 07, р) = files на, о) frix (у | Be, 0200 — р?)), 


where f is the marginal (normal) PDF of X, and fy;x is the conditional (nor- 
mal) PDF of Y, given x with mean 


02 02 
Bx = (m = 224) +р0—х 
o 01 


апа уапапсе oz — p*). Maximize the likelihood function first with respect to 
рл and c2 and then with respect to иә — p(62/01) 1, p02/0,, and c2 — p2).] 
(Anderson [1]) 


In Problem 5, let 6 denote the MLE of 0. Find the MLE of u = EX, = 1/0 and 
its asymptotic distribution. 


In Problem 1(d), find the asymptotic distribution of the MLE of 0. 
In Problem 2(a), find the MLE of d(@) = Ө? and its asymptotic distribution. 


Let X1, X2,... , Xn be a random sample from some DF F оп the real line. 
Suppose that we observe x1, x2, ... , x. which are all different. Show that the 
MLE of F is Fẹ, the empirical DF of the sample. 


Let X1, X2, ... , Xn be iid A/(u, 1). Suppose that Ө = {u > 0}. Find the MLE 
of u. 


Let (X1, X2,... , Xy-1) have a multinomial distribution with parameters 
п,р1,..., Pk-i» 0 < py Pr2,.-- , Pk—1 < L5 pj < 1, where n is known. 
Find the MLE of (pi, p2, ... , Pk—-1)- 


Consider the one-parameter exponential density introduced in Section 5.5 in its 
natural form with the PDF 


fe(x) = exp[n7 (x) + D(n) + S(x)]. 
(a) Show that the MGF of T (X) is given by 
M (t) = exp[D(n) — D( +1)] 


for t in some neighborhood of the origin. Moreover, E,T(X) = —D'(r), 
and var(T (X)) = —D"(). 
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(b) If the equation E,T (X) = T(x) has a solution, it must be the unique MLE 
of 7. 


18. In Problem 1(b), show that the unique MLE of Ө is consistent. Is it asymptoti- 
cally normal? 


8.8 BAYES AND MINIMAX ESTIMATION 


In this section we consider the problem of point estimation in a decision-theoretic 
setting. We consider here Bayes and minimax estimation. 

Let ( fo: 6 € ©} be a family of PDFs (PMFs), and X1, X2, ... , X, be a sample 
from this distribution. Once the sample point (x1, x2, ... , Xn) is observed, the statis- 
tician takes an action on the basis of these data. Let us denote by .A the set of all 
actions or decisions open to the statistician. 


Definition 1. A decision function 6 is a statistic that takes values in A; that is, 5 
is a Borel-measurable function that maps Rn into A. 


If X = x is observed, the statistician takes action 8(X) є A. 

Example 1. Let A = (a1, a2). Then any decision function ô partitions the space 
of values of (X1, ... , Xn), namely, Rn, into a set C and its complement C^, such 
that if x € C, we take action aj, and if x є С action az is taken. This is the problem 
of testing hypotheses, which we discuss in Chapter 9. 


Example 2. Let A = Ө. In this case we face the problem of estimation. 


Another element of decision theory is the specification of a loss function, which 
measures the loss incurred when we take a decision. 


Definition 2. Let A be an arbitrary space of actions. A nonnegative function L 
that maps © x A into R is called a loss function. 


The value L(0, a) is the loss to the statistician if he takes action a when Ө is the 
true parameter value. If we use the decision function ó(X) and loss function L and 
Ө is the true parameter value, the loss is the RV L(6,5(X)). (As always, we will 
assume that L is a Borel-measurable function.) 


Definition 3. Let D be a class of decision functions that map R,, into A, and let 
L be a loss function on Ө x А. The function R defined on © x D by 


(1) R(6, 5) = EgL@, 8(Х)) 


is known as the risk function associated with 8 at 0. 
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Example 3. Let A= Ө © R, L(0,a) = |0 — al^. Then 
R(0,8) = Eo L(0, 8(Х)) = Eo{5(X) — 0%, 


which is just the MSE. If we restrict attention to estimators that are unbiased, the risk 
is just the variance of the estimator. 


The basic problem of decision theory is the following: Given a space of actions A, 
and a loss function L(@, a), find a decision function ó in D such that the risk R(0, 8) 
is “minimum” in some sense for all Ө € ©. We need first to specify some criterion 
for comparing the decision functions ô. 


Definition 4. The principle of minimax is to choose 8* € D so that 


(2) max R(0,8*) < max R(0, 8) 


for all ô in D. Such a rule 8*, if it exists, is called a minimax (decision) rule. 


If the problem is one of estimation, that is, if A = ©, we call 5* satisfying (2) a 
minimax estimator of 0. 


Example 4. Let X ~ b(l, p, peo = (1, 5} and A = (a1, a2}. Let the loss 
function be defined as follows. 





The set of decision rules includes four functions: 51, 52, 53, 54, defined by 5,(0) = 
51 (1) = a1; 52(0) = a1, 02(1) = a2; 63(0) = a2, 83(1) = a1; and 54(0) = 54(1) = 
аз. The risk function takes the following values: 





i R(pi, б) R(p2, ĉi) Max R(p, ài) Min Max R(p, 8;) 
Р1.Р2 і PiP 
1 1 3 3 
" 
2 i 5 $ 5 
13 
3 т 3 Е 
4 4 2 4 


Thus the minimax solution is 82(х) = aj if x = О and = a2 if x = 1. 


The computation of minimax estimators is facilitated by the use of the Bayes 
estimation method. So far, we have considered 0 as a fixed constant and /о(х) has 
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represented the PDF (РМР) of the RV X. In Bayesian estimation we treat Ө as а 
random variable distributed according to PDF (РМР) zx (0) on ©. Also, л is called 
the a priori distribution. Now f (x | Ө) represents the conditional probability density 
(or mass) function of RV X, given that Ө € © is held fixed. Since z is the distribution 
of Ө, it follows that the joint density (PMF) of 0 and X is given by 


(3) f(x, 0) = 1(8) f (x | 6). 
In this framework R(0, 5) is the conditional average loss, E(L(0, 8(X)) | 0], given 
that 0 is held fixed. (Note that we are using the same symbol to denote the RV Ө and 
a value assumed by it.) 

Definition 5. The Bayes risk of a decision function 8 is defined by 
(4) R(x, 5) = Е. R(0, 9). 


If 0 is a continuous RV and X is of the continuous type, then 
(5) R(x, ô) = i К(Ө, 8) (8) dé 
= I L(0,8(x)) f (x | 0) (0) dx dé 


= Ii L(0,8(x)) f (x, 0) dx dé. 
If 0 is discrete with PMF т and X is of the discrete type, then 


(6) R(x, 5) = 35 LG, 509) f (x. Ө). 
0 x 


Similar expressions may be written in the other two cases. 


Definition 6. A decision function 5* is known as a Bayes rule (procedure) if it 
minimizes the Bayes risk, that is, if 


(7) R(x, 8*) = inf R(x, à). 


Definition 7. The conditional distribution of RV 6, given Ж = x, is called the 
a posteriori probability distribution of 0, given the sample. 


Let the joint PDF (PMF) be expressed in the form 
(8) Р(х, Ө) = gG0h(0 | x), 


where g denotes the joint marginal density (PMF) of X. The a priori PDF (PMF) 
7 (8) gives the distribution of 0 before the sample is taken, and the a posteriori PDF 
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(PMF) А(0 | x) gives the distribution of Ө after sampling. In terms of h(0 | x) we 
may write 


(9) К(л, 8) = few p L(0, 5(x))h(0 | x) ae | dx 

or 

(10) RGr, 3) = ) g% Б 16, 3(3))h( | »| 
x 6 


depending on whether f and л are both continuous or both discrete. Similar expres- 
sions may be written if only one of f and л is discrete. 


Theorem 1. Consider the problem of estimation of a parameter 0 € Ө C R with 
respect to the quadratic loss function L(0, 8) = (0 — 5)2. A Bayes solution is given 
by 
(11) d(x) = E{@ | X = x}. 

[5(x) defined by (11) is called the Bayes estimator]. 


Proof. In the continuous case, if л is the prior PDF of 0, then 
Ra. 3) = | «co | Је — воо ne wae] ax, 


where g is the marginal PDF of X, апа h is the conditional PDF of Ө, given x. The 
Bayes rule is a function ô that minimizes R(x, 8). Minimization of R(x, 5) is the 
same as minimization of 


fe — 8(х)]? һ(@ | x) d6, 
which is minimum if and only if 
5(x) = Е{Ө | x}. 
The proof for the remaining cases is similar. 
Remark 1. The argument used in Theorem 1 shows that a Bayes estimator is 


one that minimizes E{L(@, 5(X)) | X). Theorem 1 is a special case which says that 
if L(0, 8(X)) = [Ө — 5(X)P, the function 


ё(х) = fore | x) dé 


is the Bayes estimator for Ө with respect to л, the a priori distribution on Ө. 
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Remark 2. Suppose that T (X) is sufficient for the parameter Ө. Then it is easily 
seen that the posterior distribution of 0 given x depends on x only through T and it 
follows that the Bayes estimator of Ө is a function of Т. 


Example 5. Let X ~ b(n, p) and L(p, 6(x)) = [p — 8(х)]?. Let л(р) = 1 for 
0 < р < 1 be thea priori PDF of p. Then 


Ca- pr 


h(p|x) = ————————. 
Jo (pra - р)"—*ар 


It follows that 


1 
Eipixi- [| phip 1x) dp 


| xl 
ny? 





Hence the Bayes estimator is 


Х +1 
* = ———. 
8*(X) Y 


The Bayes risk is 
Ко, 89) = | л(р) Ув") - PP Le | dp 
x=0 
1 X41 2 
-f s - p) 2 dp 


[ [np(1 — p) + (1 — 2р)2]ар 








-ziy 
AME И 
^ 6(n4-2) 


Example 6. Let X ~ N (u, 1), and let the a priori PDF of u be A/(0, 1). Also, 
let L(w, 8) = [u — 8(Х)]?. Then 


iana LOM Qn Gro 
8(®) g(x) 


where 
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f@= [ и) ан 
= эру ехр (3 >) [ > |^ (e E м.)| du 
= CADE e |: y + xul : 


It follows that 








nes 1 Е -4 ( On ) 
WMS р у M EJ 


and the Bayes estimator is 


И nx ДЕ 
se = Blu үл} = „гү = idt 


The Bayes risk is 


Кт, 8%) = [ x) f [8* (к) — и? (x | u) dx dg 


22 2 
= nX 
- fu (2 -и) x0 dp. 


= | @а+1)ҖЧяя+и?ул(и)аи 


—оо 
u 1 
ntl 








The quadratic loss function used in Theorem 1 is but one example of a loss func- 
tion in frequent use. Some of many other loss functions that may be used are 


Je — 500! 


0 = ó X . ——— 


4 

, 189—6(X)|., and ( TENE 

Example 7. Let X1, X2, ... , Xn be iid N (i, 07) RVs. It is required to find a 
Bayes estimator of u of the form 8(x1,... , x4) = 6(x), where x = Ут xi/n, using 
the loss function L(u, 5) = |u — 8(х)|. From the argument used in the proof of 
Theorem 1 (or by Remark 1), the Bayes estimator is one that minimizes the integral 
f lu — 8) |h(u|x) du. This will be the case if we choose ô to be the median of the 
conditional distribution (see Problem 3.2.5). 
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Let the a priori distribution of u be Л/(0, 12). Since X ~ Л (u, о?/ п), we have 


Vie. Б 7 d 


FG, w= ?2лот SAR 2т? 202 
Writing 
G- u? =F -0+0 - u? = & – 0)? — 2(Х — 6)(u — Ө) + (ш – Ө)?, 


we see that the exponent in f (x, ш) is 


-3e-(2*2)- AS UE ed) ов = - 0]. 
т о? 


о? о? 


It оов p» the joint PDF of u and X is bivariate normal with means Ө, Ө, vari- 
ances т2, т? (o?/n), and correlation coefficient t/y t? + (о2/п). The marginal 
of X is N (6, t? + (c? /n)), and the conditional distribution of и, given X, is normal 
with mean 


0(0?/n) + x1? 


SMS EA 


T T 
T СЕНЕН ee НУС 
ут? + (о?/п) т? + ОТ 


and variance 


т? 1 =. ME NP = _л202/п _ 
t? + (o?/n) т? + (о2/п) 


(see the proof of Theorem 5.4.1). The Bayes estimator is therefore the median of this 
conditional distribution, and since the distribution is symmetric about the mean, 


6(o?/n) + хт? 


alae EG 


is the Bayes estimator of у. 
Clearly, 5* is also the Bayes estimator under the quadratic loss function L(t, 8) = 


[и — 6(Х)]?. 


Key to the derivation of Bayes estimator is the posteriori distribution, Л(Ө | x). 
The derivation of the posteriori distribution h(0|x), however, is a three-step process: 


1. Find the joint distribution of X and Ө given by л (0) f (x | Ө). 

2. Find the marginal distribution with PDF (PMF) g(x) by integrating (summing) 
over 0 є Q. 

3. Divide the joint PDF (PMF) by g(x). 
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It is not always easy to go through these steps in practice. It may not be possible 
to obtain A(0 | x) in a closed form. 


Example 8. Let X ~ N(w, 1) and the prior PDF of и be given by 


e (^9) 
z(u) = Пе бө? 


where @ is a location parameter. Then the joint PDF of X and џ is given by 


e- 9 
[1 + ec (4-5 


1 
Тош == 





2 
e en 


so that the marginal PDF of X is 


& e? F pon 
X) = —— ———————dy. 
ETT Min Jas [+ n 


A closed form for g is not known. 


To avoid problem of integration such as that in Example 8, statisticians use con- 
jugate prior distributions. Often, there is a natural parameter family of distributions 
such that the posterior distributions also belong to the same family. These priors 
make the computations much easier. 


Definition 8. Let X ~ f(x|0) and z(0) be the prior distribution on ©. Then 
л is said to be a conjugate prior family if the corresponding posterior distribution 
h(0 | x) belongs to the same family as zr (0). 


Example 9. Consider Example 6, where zr (u) is N (0, 1) and h(yu | x) is 


nx 1 
a (; +1'л+ z) 
so that both / and л belong to the same family. Hence (0, 1) is a conjugate prior 
for p. 


Example 10. Let X ~ b(n, р), 0 < p < 1, and л(р) be the beta PDF with 
parameters (o, 8). Then 


hip T _ p**ü 25 p! = p^ - pf?! 
fi pela -py-ldp Be +, В) ' 


which is also a beta density. Thus the family of beta distributions is a conjugate 
family of priors for p. 
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Conjugate priors are popular because whenever the prior family is parametric, the 
posterior distributions are always computable, h(@|x) being an updated parametric 
version of 7r (0). One no longer needs to go through a computation of g, the marginal 
PDF (РМР) of X. Once h(0|x) is known, g, if needed, is easily determined from 


w = TOSE) 
EW = lx) C 


Thus in Example 10, we see easily that g(x) is beta (x + œ, В), while in Example 6 
g is given by 


2 пх 


1 14 2—2 
eS scam E ЕЕ Б]. 


i-i 


Conjugate priors are usually associated with a wide class of sampling distribu- 
tions, namely, the exponential family of distributions. 








Natural Conjugate Priors 

Sampling Prior, Posterior, 

PDF(PMF,, f (x|@) z(0) h(0|x) 
2 2 2:2 
2 2 си T xt ст 

МӨ, с?) М№уи, т?) »( c? +r? а) 

G(v, B) С(о, В) С(о +v, B + x) 

b(n, р) Bia, В) В(о + х, В +п – х) 

Р(А) G(a, В) G(a + x, В +1) 

N B(r; p) B(a, B) Ba +r, B x) 

G(y, 1/0) G(a, В) G(a+v,B +x) 





Another easy way is to use a noninformative prior л (0), although one needs some 
integration to obtain g(x). 


Definition 9, A PDF л (0) is said to be a noninformative prior if it contains no 
information about Ө; that is, the distribution does not favor any value of Ө over others. 


Example 11. Some simple examples of noninformative priors are 7(@) = 1, 
7 (0) = 1/0, and zx (0) = ./1 (8). These may quite often lead to infinite mass and the 
PDF may be improper (that is, does not integrate to 1). 


Calculation of A(0|x) becomes easier bypassing the calculation of g(x) when 
f (x|0) is invariant under a group С of transformations following Fraser's [30] struc- 
tural theory. 

Let © be a group of Borel-measurable functions on R, onto itself. The group op- 
eration is composition; that is, if gı and g2 are mappings from Ry onto Ry, 8281 
is defined by g221(x) = 22(21(х)). Also, С is closed under composition and in- 
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verse, so that all maps in G are one-to-one. We define the group G of affine linear 
transformations g = {a, b} by 


gx — ad bx, aeR, Ь>0. 


The inverse of {a, b} is 


-_]_1 1 
(a, b) = | Sgh 


and the composition {a, b} and {c, d} € G is given by 
(a, b}{c, d}(x) = (a, bY(c + dx) =a 4- b(c - dx) 
= (a + bc) + Бах = (a + bc, bd}(x). 
In particular, 


(a, b}{a, b]. = (a, b) LH ;| = {0,1} =e. 


Example 12. Let X ~ (gu, 1) and let С be the group of translations G = 
({b, 1}, —oo < b < oo}. Let X1,... , X, be a sample from A (u, 1). Then we 
may write 


Xi-(ulZ.  i-l...,n 


where Z1,... , Zn are iid A/(0, 1). 
It is clear that Z ~ N (0, 1/n) with PDF 


J| vC) 


and there is a one-to-one correspondence between values of {Z, 1} and (4, 1) given 
by 


(x, 1} = ш, 1}{Z, 1} = {u +z, 1}. 


Thus x = и + Z with inverse map Z = x — u. We fix х and consider the variation in 
Z as a function of и. Changing the PDF element of Z to и, we get 


JZ о[и 20] 


as the posterior of и given x with prior л (p) = 1. 


Example 13. Let X ~ N (0, с?) and consider the scale group G = {{0, c}, c > 
0). Let X1, X2, ... , Xn be iid N(0, 02). Write 
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Xi = (0, o}Z;, i=1,2,...,n 


where Z; are iid Л/(0, 1) RVs. Then the RV nS? = У" Z2 ~ x?(n) with the PDF 


1 ns? 24п/2-1 
PP 2) P (5) шыл» 


The values of {0, sz} are in one-to-one correspondence with those of (0, 7} through 


(0, Sx} = {0, c )(0, Sz}, 


where nS? ERA Хг; so that s, = os,. Considering the variation in s, as а 


function of a for fixed sy, we see that ds, = s, (do /o?). Changing the PDF element 
of s; to с, we get the PDF of o as 


2)-1 
1 Бу ns2 ns? о 
mrad PV 262] \ o? 


which is the same as the posterior of с given s, with prior zr (o) = 1/o. 


Example 14. Let X... X, be a sample from AN (џи, c?) and consider the affine 
linear group G = {{a, b), -co < a < co, b > 0}. Then 


Xi = lu. 0 }Zi, i—]1,...,n 


where Z;'s are iid (0, 1). We know that the joint distribution of (Z, S2) is given 


by 
—1)/21—1 
т {йу 1 (n — 152 e 
Va 15 т | 2 


n2 E 
x exp |-2z pe] «| oo pe), 


Further, the values of (2, s;) are in one-to-one correspondence with the values of 
{u, o) through 


x, sx) = (uw, o HZ, s2} = {и +02, osz} 
„кш Л саңа eee 
с о 


Consider the variation of (Z, sz) as a function of (u, с) for fixed (x, sx). The Jacobian 
of the transformation from {Z, sz} to {u, с} is given by 
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1 х-и 
Pa о o |. Se 
| 0 -& | c3 
в? 





Hence, the joint PDF of (и, с) given (x, sx) is given by 


= _пш-®Ю%]__1_ [e-o2]^ ^" 
25 P 2? | Ja DA| 202 


D/2]-1 
(-Ds2]T«-02]* 97 in- pg 
SEPT 202 72e o ` 


This is the PDF that one obtains if z(u) = 1 and z(o) = 1/o and и and о are 
independent RVs. 


The following theorem provides a method for determining minimax estimators. 
Theorem 2. Let ( fo: Ө € ©} be a family of PDFs (PMFs), and suppose that an 
estimator à* of Ө is a Bayes estimator corresponding to an a priori distribution л 


on ©. If the risk function R(0, 8*) is constant on ©, then 5* is a minimax estimator 
for Ө. 


Proof. Since 8* is the Bayes estimator of Ө with constant risk r* (free of 0), we 
have 


oo 
r* = К(л,5*) = f К(Ө, 8*)л(0) de 
—оо 


= int f кө, 56) do 


< sup | inf R(0,8) < inf sup R(@, $). 
өєө є” 5eD geo 


Similarly, since r* = R(0,6*) for all Ө є ©, we have 
r* = sup R(6, 5*) > inf sup R@, ô). 
BeO ED 6c6 
Together, we then have 


sup R(0, 8*) = inf шр R(@, 5) 
6cO 


which means that 8* is minimax. 


The following examples show how to obtain constant risk estimators and the suit- 
able prior distribution. 
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Example 15 (Hodges and Lehmann [40]). Let X ~ b(n, р), 0 x p < 1. We seek 
a minimax estimator of p of the form aX + B, using ће squared-error loss function. 
We have 


R(p, 8) = Ey(aX + B — р)? = Epla(X — np) + B + (an — 1)р]? 
= [Коп — 1)? — o?n]p? + (o?n + 2B(an — 1)]р + £?, 


which is a quadratic equation in p. To find о and £ such that R(p, $) is constant for 
all p € Ө, we set the coefficients of p? and p equal to 0 to get 


(an —1 —a?n — 0. and o?n 4-28 (an — 1) = 0. 


It follows that 
CIEN REA ES 
та) Jn (/n — 1) 
апа 
1 1 
= ——— or —————. 
2(1 + n) 2(\/п — 1) 


Since 0 < р < 1, we discard the second set of roots for both œ and В, and then the 
estimator is of the form 


В 


x 1 


VO EROS М ТЕ 8. 


It remains to show that 5* is Bayes against some a priori PDF л. 
Consider the natural conjugate a priori PDF 


л(р) = [B(o’, g)]  p*1—pff*!, = 0О<р<1, «',8'>0. 


The a posteriori PDF of p, given x, is expressed by 


рх+%* 1 a -— py 


h(p|x)= Box cula 


It follows that 

B(x +a’ +1,n—x + В) 
B(x +a, n- x + В) 

_ x+a’ 

T n+ +В” 


Eip |x) = 
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which is the Bayes estimator for a squared-error loss. For this to be of the form 8*, 
we must have 
1 _ 1 aby 1 _ a’ 
Anl dn) п+о +В! 21+ Jn) п+о +В 


giving a’ = d A/n/2. It follows that the estimator 8* (x) is minimax with constant 
risk 


1 
К(р, 8*) = —————; forall pe[0,1]. 
(p, 87) A04 vay oral p € [0, 1] 


Note that the UMVUE (which is also the MLE) is 6(X) = X/n with risk R(p, d) = 
p( — p)/n. Comparing the two risks (Figs. 1 and 2), we see that 


У1+2/п 


Р-р) 2 jl? 
СПЕВУ 


1 
if and only if 
z * A0 JA if and only if |р — 
so that 
R(p,9*) < R(p, 8) 


in the interval G — аһ, i + аһ), where a, — 0 as n — oo. Moreover, 











sup, R(p, à) 1/4n _п+2/п+1 
—————— ————————Má— — > | as п — oo. 
sup, R(p, 8%) 1/4 VAY) n 
R 
0.25 
1/16 R(p,3*) 
0.5 1 р 


Fig. 1. Comparison of R(p, 8) and Ё(р,8*),п = 1. 
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A(p,8) 
1/64 R(p,*) 





0 0.5 1 p 


Fig. 2. Comparison of R(p, 5) and R(p,5*), n = 9. 
Clearly, we would prefer the minimax estimator if п is small, and would prefer the 
UMVUE because of its simplicity if n is large. 


Example 16 (Hodges and Lehmann [40]). A lot contains N elements, of which D 
are defective. A random sample of size n produces X defectives. We wish to estimate 


D. Clearly, 
D\ (N — DANA 
mu sas (DC) - 


D ; nD(N —n)(N – D) 
EpX LUN and ор = мм 1) ^C 


Proceeding as іп Example 15, we find a linear function of X with constant risk. 
Indeed, Ep(aX + B ~ D)? = B? when 


N an 
> Fa NDE 8 pa ine): 


We show that œX + В is the Bayes estimator corresponding to the a priori PMF 
‚үм 
P[D = 4) =с f ( ;)r'a = py pA — py?" dp, 
0 


where a, b > 0 and c =T (a + b)/ U(a)Y (b). First note that a Р{р = 4} = 1, 
so that 
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М e T(a-tb)T(a*dF(N*b-d) |, 


4=0 d / l'(a)T (b) T(N +a +b) 
The Bayes estimator is given by 
вю = See AGAT DTN +»- д 

dc (OG) QT (a +а4)гау+в-—4) 


A little simplification, writing d = (d — a) + a and using 


(0-9) = ДЕ 


yields 
sk) ENF (М) га +a + DT(N +b- d) 
= “ШШШ л ——————————————&a 
Б" (F-")r(a +a) P(N +b- d) 
_,4+b+N | a(N —n) 
^ a+b+n а+Ь+т 
Now putting 
_a+b+N i pa aem 
^ а+Ь+п |» atb+n 


and solving for a and b, we get 


B pao Nao e 


a-l’ a-—1 





Since a > 0, В > 0, and since b > 0, N > ап +В. Moreover, о > lif N >n+1.If 
N = п + 1, the result is obtained if we give D a binomial distribution with parameter 
р= 4. If N = п, the result is immediate. 


The following theorem, which is an extension of Theorem 2, is of considerable 
help to prove minimaxity of various estimators. 


Theorem 3. Let {л (0); k > 1) be a sequence of prior distributions on © and let 
(87) be the corresponding sequence of Bayes estimators with Bayes risks Ry; бу). 
If lim sup, , „о R(x; 07) = r* and there exists an estimator 8* for which 


sup R(0, 8*) < r*, 
0c8 


then ó* is minimax. 
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Proof. Suppose that 8* is not minimax. Then there exists an estimator à such 
that 


sup К(Ө, $) < ор R(0, 8*). 
060 0€ 


On the other hand, consider the Bayes estimators (07) corresponding to the priors 
{лу (8)}. We obtain 


(12) R(x, бу) = f R(0, 8%) (0) 40 
(13) < / R(0, 8)r4(0) dO 
(14) « sup R(0, 8), 

өєө 


which contradicts ѕиррсе R(0, 8*) < r*. Hence 8* is minimax. 


Example 17. Let X1,... , Xn be a sample of size n from N(y, 1). Then the MLE 
of и is X with variance 1 in n. We show that X is minimax. Let и ~ N(0, т2). Then 
the Bayes estimator of u is Х[ит2/(1 + пт2)]. The Bayes risk of this estimator is 


2 
ROGA "у= B H mu) 


Now, as 1? — оо, R(x, 8*,) > 1/n, which is the risk of X. Hence X is minimax. 


Definition 10. A decision rule à is inadmissible if there exists a ó* є D such that 
R(0,5*) < R(@, 8), where the inequality is strict for some 0 € Ө; otherwise, à is 
admissible. 


Theorem 4. If X1,... , X, is a sample from N(0, 1), then X is an admissible 
estimator of Ө under siare error loss L(0,a) = (0 — а)?. 


Proof. Clearly, X ~ N(0, 1/n). Suppose that X is not admissible, then there 
exists another rule 5*(x) such that R(0,8*) < R(0, X) while the inequality is strict 
for some Ө = 00 (say). Now, the risk R(0, 8) is a continuous function of Ө and hence 
there exists an є > 0 such that R(0,8*) < R(0, X) — ғ for |0 — | < €. 

Now consider the prior N (0, т2). Then the Bayes estimator is 


= ] c 1 пт? 
zi = ithrisk - | 22.1. 
6(Х) х (1 + т) with ris я (=) 


Thus 


1 


R(x, X) — R(x, 6,2) = aie 
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However, 





t[R(x, 6*) — R(x, X)] =: fire, ó*) — R(0, Die exp (- z^) 40 
и 
-zu 
SL wore 


0 x r[RGr, 8*) — R(x, X)} + c [RGr, X) — Ra, 5,2)] 


« ы am ( l в?) do + = : 
~ Ax Jote Pi^22 п (1 +ntr?) 


The right-hand side goes to —2e?/ /2m as т — oo. This result leads to a contradic- 
tion that 5* is admissible. Hence X is admissible under squared loss. 

Thus we have proved the X is an admissible minimax estimator of the mean of a 
normal distribution V (0, 1). 


We get 





PROBLEMS 8.8 


1. It rains quite often in Bowling Green, Ohio. On a rainy day a teacher has es- 
sentially three choices: (1) to take an umbrella and face the possible prospect of 
carrying it around in the sunshine; (2) to leave the umbrella at home and perhaps 
get drenched; or (3) to just give up the lecture and stay at home. Let Ө = (01, 05], 
where 6, corresponds to rain, and 62, to no rain. Let A = (a1, a2, аз}, where а; 
corresponds to the choice i, i = 1,2, 3. Suppose that the following table gives 
the losses for the decision problem: 





The teacher has to make a decision on the basis of a weather report that depends 
on Ó as follows: 









W, (rain) 
№, (no rain) 
Find the minimax rule to help the teacher reach a decision. 


2. Let X1, X2, ... , X, bea random sample from P (A). For estimating A, using the 
quadratic error loss function, an a priori distribution over ©, given by the PDF 
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л(А) =e if A > 0, 


=0 otherwise, 


is used. 
(a) Find the Bayes estimator for A. 


(b) If it is required to estimate (A) = e~* with the same loss function and 
same a priori PDF, find the Bayes estimator for p(A). 


. Let X1, X2, ... , X, be a sample from b(1, 0). Consider the class of decision 


rules à of the form 8(x1, x, ... , Xn) = n^! Уу xi +a, where a is a constant 
to be determined. Find œ according to the minimax principle, using the loss 
function (0 — 8)2, where 8 is an estimator for Ө. 


. Let 8* be a minimax estimator for ay (0) with respect to the squared-error loss 


function. Show that a8* --b(a, b constants) is a minimax estimator for a (0)--b. 


. Let X ~ b(n,0), and suppose that the a priori PDF of Ө is U(0, 1). Find the 


Bayes estimator of 6, using loss function L(@, 8) = (0 — 8)? ДӨ( — @)]. Find a 
minimax estimator for Ө. 


6. In Example 5, find the Bayes estimator for p?. 


10. 


п. 


. Let X;, X2,... , Xn be a random sample from G(1, 1/A). to estimate А, let the 


a priori PDF on А be x (A) = e ^, X > 0, and let the loss function be squared 
error. Find the Bayes estimator of A. 


. Let X1, X2,... , X, be iid U(O, 0) RVs. Suppose that the prior distribution of 


0 is a Pareto PDF (0) = œa% [0%+1 for@ > a, = Oforé@ < a. Using the 
quadratic loss function, find the Bayes estimator of 0. 


. Let T be the unique Bayes estimator of Ө with respect to the prior density л. 


Then T is admissible. 


Let Xj, X2,... , Xn be iid with PDF fo(x) = exp[-(x — 0)], x > Ө. Take 
z (0) = e~°, 6 > 0. Find the Bayes estimator of Ө under quadratic loss. 


For the PDF of Problem 10, consider the estimation of 6 under quadratic loss. 
Consider the class of estimators a (X(1) — 1/n) for all a > 0. Show that Хү) — 
]/n is minimax in this class. 


89 PRINCIPLE OF EQUIVARIANCE 


Let P = (Pe: 0 € O} be a family of distributions of some RV X. Let X C Ray be 
the sample space of values of X. In Section 8.8 we saw that the statistical decision 
theory revolves around the following four basic elements: the parameter space ©, the 
action space Á, the sample space X, and the loss function L(6, a). 


Let G be a group of transformations that map X onto itself. We say that P is 


invariant under С if for each g є С and every 0 є Ө, there is a unique 0' = 20 є © 
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such that g(X) ~ Pg@ whenever X ~ Pg. Accordingly, 
(1) Pe(g(X) € A} = PgolX є A] 


for all Borel subsets in Rn. We note that the invariance of P under G does not change 
the class of distributions we begin with; it only changes the parameter or index 0 to 
£9. The group С induces ©, a group of transformations g on Ө onto itself. 


Example І. Let X ~ b(n, p, Oz p < 1. Let = {g, e}, where g(x) =n — x 
and e(x) = x. Then gg^! = e. Clearly, g(X) ~ b(n, 1 — p), so that gp = 1 — p and 
ер = e. The group © leaves (b(n, p); 0 < p < 1} invariant. 


Example 2. Let X4, X2,... , X, be iid A (p, c?) RVs. Consider the group of 
affine transformations 9 = (а, Б}, a € R, b > O} on X. The joint PDF of 
{a, b) X = (a +ЬХ\,... ‚а + bX,) is given by 


n 
JO X2- Xn) = a EE 2/0 —-a- "| 
and we see that 
8б, 0) = (a + wo, ba) = (a, Би, o}. 
Clearly, G leaves the family of joint PDFs of X invariant. 


То apply invariance considerations to a decision problem we need also to ensure 
that the loss function is invariant. 


Definition 1. A decision problem is said to be invariant under a group G if 


(i) P is invariant under C, and 


(ii) the loss function L is invariant in the sense that for every g є С anda є A 
there is a unique a’ € .A such that 


L(0, a) = L(g, a’) for all Ө. 


The a’ € A in Definition 1 is uniquely determined by g and may be denoted by 
&(a). One can show that G = {g : g є G} is a group of transformations of A into 
itself. 


Example 3. Consider the estimation of и in sampling from N (и, 1). In Example 
8.9.2 we have shown that the normal family is invariant under the location group 
G = ЦЬ, 1), —oo < b < oo]. Consider the quadratic loss function 


L(u,a) = (и — а). 
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Then (5, 1)a = b +a and {b, 1Ни, 1} = {b + n, 1). Hence 
L({b, 1и, [b, Ja) = Lib + и) — (b + a) = (и — a = Líp, a). 


Thus L(y, a) is invariant under С and the problem of estimation of џи is invariant 
under group G. 


Example 4. Consider the normal family N (0, c?) which is invariant under the 
scale group С = {{0, c), c > 0). Let the loss function be 


1 
L(o?, a) = — (s? — ay. 
с 
Now (0, сја = ca and (0, c}{0, o?) = (0, co?) and 
1 1 
ІЛ, сђо?, (0, сђа] = + (co? — ca = — (o? — а)? = Lo’, а). 
cro o 


Thus the loss function L(c?, a) is invariant under G = {{0,c},c > 0} and the 
problem of estimation of o? is invariant. 


Example 5. Consider the loss function 
a a 
L(o?,a) = — — 1 — log — 
(о?, а) = = 08 = 


for the estimation of o? from the normal family A (0, 02). We show that this 1055 
function is invariant under the scale group. Since 


(0, с]о2 = (0, co?) апа (0,c)(0, a} = 10, ca}, 
we have 
L{{0, с}о?, (0, cla] = <5 — 1 — log $ 
co со 
== L(a?, a). 


Let us now return to the problem of estimation of a parametric function y : Ө — 
R. For convenience let us take Ө C R and у (Ө) = 0. Then A = Ө and G = С. 


Suppose that 0 is the mean of PDF fo, G = ({b, 1}, b € R}, and { fa] is invariant 
under G. Consider the estimator 9(X) = X. What we want in an estimator д* of Ө is 
that it changes in the same prescribed way as the data are changed. In our case, since 
X changes to [b, 1}X = X + b, we would like X to transform to (b, 1X =Х+Ь. 
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Definition 2. An estimator 6(X) of 0 is said to be equivariant, under C, if 

(2) 6(gX) = gó(X) for all е € G, 

where we have written gX for g(X) for convenience. 


Indeed, е on 5 induces g on ©. Thus if X ~ fo, then gX ~ fzo, so if 5(X) 
estimates Ө then 5(gX) should estimate 20. The principle of equivariance requires 
that we restrict attention to equivariant estimators and select the “best” estimator in 
this class in a sense to be described later in this section. 


Example 6. In Example 3, consider the estimators 9; (X) = X, 02(X) = (X qa) + 
X (ny)/2, and (X) = æ X, o a fixed real number. Then С = {(b, 1), —oo < b < оо} 
induces G = © on Ө and both 0), 02 are equivariant under C. The estimator 53 is not 
equivariant unless œ = 1. In Example 1, 2(Х) = X/n is an equivariant estimator 
of p. 


In Example 6, consider the statistic 3(X) = 52. Note that under the translation 
group (b, 1}X = X + b and 3({b, 1}X) = д(Ж). That is, for every g є G, 0(gX) = 
9(X). A statistic д is said to be invariant under a group of transformations б if 
9(gX) = д(Х) for all g € G. When С is the translation group, an invariant statistic 
(function) under G is called location invariant. Similarly, if G is the scale group, we 
call д scale invariant, and if G is the location-scale group, we call д location-scale 
invariant. In Example 6, 84(X) = S? is location invariant but not equivariant, and 
05 (X) and 84(X) are not location invariant. 

A very important property of equivariant estimators is that their risk function is 
constant on orbits of 0. 


Theorem 1. Suppose that д is an equivariant estimator of Ө in a problem that is 
invariant under G. Then the risk function of д satisfies 


(3) R(g0, 8) = R@, д) 


for all 9 є Ө and g є ©. If, in particular, G is transitive over Ө, then R(@, 3) is 
independent of Ө. 


Proof. We have for @ є © and g € C, 


R(0, 8(X)) = EeL(0, 3(X)) 
= EgL(g0, g0(X)) (invariance of L) 
= EgL (80, 9(g(X)) (equivariance of 5) 
= Ego L(80, 0(X)) (invariance of {Po} 
= R(80, 9(Х)). 
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In the special case when G is transitive over ©, then for any 01, 0? є © there exists а 
g € G such that 0? = 201. It follows that 


R(62, д) = R(g01, 0) = КОӨ, д) 
so that R is independent of 0. 


Remark 1. When the risk function of every equivariant estimator is constant, 
an estimator (in the class equivariant estimators) that is obtained by minimizing the 
constant is called the minimum risk equivariant (MRE) estimator. 


Example 7. Let X1, X2, ... , Xn iid RVs with common PDF 
f (x, 0) = expl- (x — 0)], x>6 and =0 їх <0. 
Consider the location group G = {{b, 1), —oo « b < оо}, which induces Qon © 
where G = G. Clearly, G is transitive. Let L(0, д) = (0 — 9)”. Then the problem of 


estimation of @ is invariant, and according to Theorem 1, the risk of every equivariant 
estimator is free of 0. The estimator 59(X) = X 1) — 1/n is equivariant under G since 


1 1 
éo((b, 1}X) = mn no +b)--=b4+Xq)—- — = b + 80(Х). 
We leave the reader to check that 
К(0, до) = Eo | X : eel! 
» 00) = ө (D) n TT. n * 
and it will be seen later that до is the MRE estimator of Ө. 

Example 8. In this example we consider sampling from a normal PDF. Let us 
first consider estimation of u when с = 1. Let G = ЦЬ, 1}, -œ < b < оо}. 
Then д(%) = X is equivariant under G and it has the smallest risk 1/n. Note that 
(x, 1}71 = (—x, 1) may be used to designate x on its orbits 

(X, Ix = (xy — X,.... Xn — X) = А(х). 
Clearly, A(x) is invariant under G and A(X) is ancillary to и. By Basu's theorem 
A(X) and X are independent. 
Next, consider estimation of a? with и = 0 and G = {{0,c},c > 0}. Then 


=X з is an equivariant estimator of o?. Note that (0, s,}~! may be used to 
designate x on its orbits 


(0, 5,]-!x = (3. Mae 2) = A(x). 


x Sx 
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Again, A(x) is invariant under С and A(X) is ancillary to o?. Moreover, 52 and A(X) 
are independent. 

Finally, we consider estimation of (u, 07) when G = {{Ь,с}, -œ < b < 
оо, c > 0}. Then (X, 52), where 52 = Y 1 (X; — X}? is an equivariant estimator of 
(u, 07). Also, (X, 5)! may be used to designate x on its orbits 


(X, s} Ix = (A=... z) = A(x). 


x Sx 


Note that the statistic A(X) defined in each of the three cases considered in Ex- 
ample 8 is constant on its orbits. A statistic A is said to be maximal invariant if 


(i) A is invariant, and 

(ii) А is maximal, that is, A(xi) = A(x2) > x1 = g(x2) for some g € GC. 

We now derive an explicit expression for MRE estimator for a location parameter. 
Let X1, X2,... , Xn be iid with common PDF fo(x) = f(x —0), —co < 0 < оо. 


Then { fa : 0 € ©} is invariant under G = {{b, 1], —oo < b < оо}, and an estimator 
of Ө is equivariant if 


a(b, 1X) = a(X) +b 
for all real b. 

Lemma 1. An estimator д is equivariant for 0 if and only if 
(4) aX) = Ху + q(X2 — Xy... Xn — Xi), 
for some function q. 

Proof. If (4) holds, then 


9((b, Ix) = b+ xı + q(xo— xi, ... , Xn — x1) 


= b+ (x). 
Conversely, 
O(X) = 9(x1-- x1 — x1, X1 +42 — X1,- - , X1 Xn — Хр) 
= xq + 0(0, x2 — x1,... , Xn — x1), 
which is (4) with q(x2 — x1,... , Xn — x1) = 9(0, x2 — xi, ... , Xn — x1). 


From Theorem 1 the risk function of an equivariant estimator à is constant with 
risk 


R(0,8) = R(0, 3) = Eg[3QO]? for all Ө, 
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where the expectation is with respect to the PDF у(х) = f(x). Consequently, 
among all equivariant estimators д for Ө, the MRE estimator is до, satisfying 


ЕО, до) = min КО, д). 


Thus we only need to choose the function q in (4). 
Let L(0, 8) be the loss function. Invariance considerations require that 


140,9) = L(g0, 28) = L(0 + b, 9 +b) 


for all real b so that L(0, д) must be some function w of 3 — 0. 

Let Y; = X; — X1, = 2,...,n, and Y = (Yo, ... , Yn) and g(y) be the joint 
PDF of Y under 0 = 0. Let h(xly) be the conditional density, under 0 = 0, of X, 
given Y — y. Then 


(5) RO, д) = Eolw(X1 — 4(¥))] 
- | | Í weer — ahay) ax] g(y)dy. 


Then А(0, 3) will be minimized by choosing, for each fixed y, q (y) to be that 
value of c that minimizes 


(6) f w(u — с)һ(и|у) du. 


* 


Necessarily, q depends on y. In the special case w(d — 0) = (d — 6)’, the integral 
in (6) is minimum when c is chosen to be the mean of the conditional distribution. 
Thus the unique MRE estimator of 0 is given by 


(7) до(®) = x1 — EelXilY = y). 


This is the Pitman estimator. Let us simplify it a little more by computing Eg{x; — 
ХҮ = y). 

First we need to compute h(u|y). When Ө = 0, the joint PDF of X1, Y2, ... , Y, 
is easily seen to be 


fon) / Ол + y» fea + Yn), 


so the joint PDF of (Y2,... , Үл) is given by 
со 
[туш v Fut yn) du 
—oo 


It follows that 


ffu + уг): fU + y) 


" holy e o V CTI ATE 
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Now let Z = x; — X1. Then the conditional PDF of Z given y is h(x; — z | y). It 
follows from (8) that 


(9) до(х) = Eo{Zly} = | zh(x; — z) dz 


ГП. fæ- Daz ` 


Remark 2. Since the joint PDF of Xi, X2, ... , Xn is П fo(xj) = M= f 
(xj —0), the joint PDF of 0 and X when 6 has prior л (0) is x (8) IL. f (xj —0). The 
joint marginal of X is [°° л(0) TTj=1 f (ху — Ө) 40. It follows that the conditional 
PDF of 0 given X — x is given by 

7 IT. 70; – 9) 
Јол) jai Fj 0) 40 
Taking л (0) = 1, the improper uniform prior on Ө, we see from (9) that до(х) is the 


Bayes estimator of Ө under squared-error loss and prior zx: (0) = 1. Since the risk of 
до is constant, it follows that до is also a minimax estimator of Ө. 


Remark 3. Suppose that $ is sufficient for Ө. Then Ivi fo(xj) = ge(s)h(x), 
so that the Pitman estimator of Ө can be rewritten as 


Foo 9 Пы fo) d6 

Г „Пыл 60у) 49 

_ Feo 8g S) h Q0 dO 
[Ass ge G)h(x) dO 

_ Foo 9865046 

I, вө(в) 20 ' 


до(х) = 


which is a function of s alone. 


Examples 7 and 8 (continued). A direct computation using (9) shows that X1) — 
1/n is the Pitman MRE estimator of 0 in Example 7, and X is the MRE estimator 
of џи in Example 8 (when с = 1). The results can be obtained by using sufficiency 
reduction. In Example 7, X(1) is the minimal sufficient statistic for 0. Every (trans- 
lation) equivariant function based on Хүр) must be of the form à; (X) = Xa) + с, 
where c is a real number. Then 


КӨ, д.) = EolXqy tc — 6}? 


1 1? 
= Eg Xo 7-6 (e)] 
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1\2 iM 1\2 
п п п 


which is minimized for c = —1/n. In Example 8, X is the minimal sufficient statistic, 
so every equivariant function of X must be of the form 0,(X) = X + c, where с isa 
real constant. Then 


nem 1 
R(t, д) = Е„(Х+с—иш)* = т 4c, 
which is minimized for c = 0. 


Example 9. Let Xi, Х2,..., Xn be iid U@ — 4,0 + $). Then (Xa), Xw) is 
jointly sufficient for 0. Clearly, 


1, ха) < Ê < Xin), 
0, otherwise, 


fe -%.ъ-®= | 


so that the Pitman estimator of @ is given by 


X(n) 
aos Ју 946. _ Xa) t Xm 
=~ = : 
[sd dé 2 
We now consider, briefly, the Pitman estimator of a scale parameter. Let X have a 
joint PDF 


1 Xj Xn 
fa) = —f (5.5) 
where f is known and o > 0 is a scale parameter. The family ( f; : o > 0} remains 
invariant under G = {{0, c), c > 0}, which induces G = С on Ө. Then for estimation 
of с“ loss function L(o, а) is invariant under these transformations if and only if 
L(o,a) = w(a Jof ). An estimator д of of is equivariant under G if 


ado, c}X) = ta X) or all c > 0. 


Some simple examples of scale-equivariant estimators of с are the mean deviation 
У 1X1 — X |/n and the standard deviation y ВОХ; — X)?/(n — 1). We note that the 
group G over Ө is transitive, so according to Theorem 1, the risk of any equivariant 
estimator of с“ is free of ø and an MRE estimator minimizes this risk over the class 
of all equivariant estimators of c*. Using the loss function L(o, a) = w(a/a*) = 
(a — o*)*/a%*, it can be shown that the MRE estimator of o*, also known as the 
Pitman estimator of o*, is given by 
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Ду v^ fn... vxs)dv 


io) = SS OO: 
i vt t2k-l f(vx1,... , Ох) dv 

Just as in the location case, one can show that до is a function of the minimal 
sufficient statistic and до is the Bayes estimator of с“ with improper prior zt (0) = 
1/a?**!, Consequently, ду is minimax. 


Example 8. (continued). In Example 8, the Pitman estimator of o* is easily 
shown to be 


k/2 
(TIG 9/2] f 4 io 
= T'[(n + 2k)/2] (> x) | 


Thus the MRE estimator of o is given by {Г[(п + 1)/2],/> 1 x2 Г{(лп + 2)/2]) and 
that of o? by У" X2/(n + 2). 


Example 10. Let X4, X2, ... , Xn be iid U(0, 0). The Pitman estimator of Ө is 
given by 


an(X) = хы" dv _nt2, 
RAS fü wide n4l da 

(n) 

Finally, we consider, briefly, estimation of the mean vector of a multivariate nor- 
mal distribution. Let Ө = (01, 0›,..., 0) be a column vector and I, be ће p x p 
identity matrix. Let X;, X2, ... , X, be a sample from a p-variate normal distribu- 
tion with mean vector Ө and variance—covariance matrix lj. Let L(0,a) = (0 — 
а)'(0—а) = 2. (0; — aj Y^. In the univariate (p — 1) case we have seen that the 
sample mean X is a minimax and admissible estimator of 0. It is therefore natural 
to consider X — (Xi Ce Xp) as an estimator of @ also in the p-variate case 
and suspect that it has the same properties as in p — 1 case. Certainly, X is a mini- 
max estimator, but is it admissible, too? Stein [108] showed that X is admissible for 
p = 2. But for p > 3, James and Stein [45] showed that the estimator 


р-2\;, 
(10) (X) ( "X )x 


™ 


improves on Ж for all Ө. 

This is a surprising result but is typical in a variety of multiparameter estimation 
problems. What is optimal in independent estimation problems is not necessarily 
optimal if the problems are considered simultaneously. It should be noted, however, 
that @° does not share the other optimality properties of X. It is not MLE, is biased, 
and is not equivariant. It only dominates X under quadratic loss. 

The estimator 0° takes X and shrinks it toward the origin (provided X'X > p—2). 
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PROBLEMS 8.9 


In all problems assume that Х|, X2, ... , Xn is a random sample from the distribu- 
tion under consideration. 


1. Show that the following statistics are equivariant under translation group: 
(a) Median (Xj). 
(b) (Хау + Хүл))/2. 
(с) Xinp}+1, the quantile of order p, 0 < р < 1. 
(d) (Ху + Xe + Xa) /( — 2r). 


(e) X + Y, where Y is the mean of a sample of size m, m Æ n. 


2. Show that the following statistics are invariant under location or scale or 
location-scale group: 


(a) X — median(X;). 

(6) Xq1-) — Хю. 

(с) Dy lX: — XI/n. 

De (Xi — Xy - Y) 


[3.06 - XY X340: - YY] 
random sample from a bivariate distribution. 


(d) where (X1, Yy),..., (Xn, Yn) isa 


1/2? 


3. Let the common distribution be G (o, o), where œ (> 0) is known and o > O is 
unknown. Find the MRE estimator for o under loss L(o, а) = (1 — a /о)?. 


4. Let the common PDF be the folded normal distribution 


2 
(2 [26 — uy Tiu oo) (x). 


Verify that the best equivariant estimator of и under quadratic loss is given by 


à-X- expl—(n/2)(Xay — Xl | 
Vina ^ Pa zy exp(—2/2) az] 
5. Let X ~ U(0, 20). 
(a) Show that (X (1), X») is jointly sufficient statistic for Ө. 
(b) Verify whether or not (X(n) — X(1)) is an unbiased estimator of Ө. Find an 
ancillary statistic. 


(c) Determine the best invariant estimator of 0 under the loss function L(0, a) — 
(1 — а/0)?. 
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6. Let 


fo(x) = 5 exp{—[x — 61). 


Find the Pitman estimator of Ө. 


7. Let fo(x) = exp[-(x — 0)] - (П + exp —(x — 8)]F ?, for x € &, 0 є R. Find 
the Pitman estimator of Ө. 


8. Show that an estimator д is (location) equivariant if and only if 


a(x) = до(х) + ф(х), 


where до is апу equivariant estimator and ф is ап invariant function. 


9. Let Х|, X2 be iid with PDF 
2 x | 
fo (x) = — (1 — ~) i O<x<o and = 0 otherwise. 
c с 


Find, explicitly, the Pitman estimator of c". 
10. Let X1, X2, ... , Xn be iid with PDF 


1 x А 
fo(x) = 9 ехр (-5) , х> 0 and = 0, otherwise. 


Find the Pitman estimator of Ө“. 


CHAPTER 9 


Neyman-Pearson Theory of 
Testing of Hypotheses 


9.1 INTRODUCTION 


Let X1, X2,..., Xn be a random sample from a population distribution Fg, Ө є 
Ө, where the functional form of Fg is known except perhaps for the parameter Ө. 
For example, the X;'s may be a random sample from A/(0, 1), where Ө € is 
not known. In many practical problems the experimenter is interested in testing the 
validity of an assertion about the unknown parameter 0. For example, in a coin- 
tossing experiment it is of interest to test, in some sense, whether the (unknown) 
probability of heads p equals a given number po, 0 < po < 1. Similarly, it is 
of interest to check the claim of a car manufacturer about the average mileage per 
gallon of gasoline achieved by a particular model. A problem of this type is usually 
referred to as a problem of testing of hypotheses and is the subject of discussion in 
this chapter. We develop the fundamentals of Neyman-Pearson theory. In Section 9.2 
we introduce the various concepts involved. In Section 9.3 the fundamental Neyman- 
Pearson lemma is proved, and Sections 9.4 and 9.5 deal with some basic results in 
the testing of composite hypotheses. Section 9.6 deals with locally optimal tests. 


9.2 SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING 


In Chapter 8 we discussed the problem of point estimation in sampling from a pop- 
ulation whose distribution is known except for a finite number of unknown parame- 
ters. Here we consider another important problem in statistical inference, the testing 
of statistical hypotheses. We begin by considering the following examples. 


Example 1. In coin-tossing experiments one frequently assumes that the coin is 
fair, that is, the probability of getting heads or tails is the same: 1. How does one test 
whether the coin is fair (unbiased) or loaded (biased)? If one is guided by intuition, a 
reasonable procedure would be to toss the coin л times say, and count the number of 
heads. If the proportion of heads observed does not deviate “too much” from p = 1, 
one would tend to conclude that ће coin is fair. 
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Example 2. It is usual for manufacturers to make quantitative assertions about 
their products. For example, a manufacturer of 12-volt batteries may claim that a 
certain brand of their batteries lasts for N hours. How does one go about checking 
the truth of this assertion? A reasonable procedure suggests itself: Take a random 
sample of n batteries of the brand in question and note their length of life under 
more or less identical conditions. If the average length of life is “much smaller" than 
N, one would tend to doubt the manufacturer's claim. 


To fix ideas, let us define formally the concepts involved. As usual, X = (X1, X2, 
... , Xn) and let X ^ Fg, Ө € Ө С Rx. It will be assumed that the functional form 
of Fg is known except for the parameter 0. Also, we assume that Ө contains at least 
two points. 


Definition 1. A parametric hypothesis is an assertion about the unknown parame- 
ter Ө. It is usually referred to as the null hypothesis, Но: Ө € Go C Ө. The statement 
Hy: 0 € Өү = Ө — Op is usually referred to as the alternative hypothesis. 


Usually, the null hypothesis is chosen to correspond to the smaller or simpler sub- 
set Oo of Ө and is a statement of "no difference," whereas the alternative represents 
change. 


Definition 2. If ©9(©;) contains only one point, we say that €99(O1) is simple; 
otherwise, composite. Thus, if a hypothesis is simple, the probability distribution of 
X is specified completely under that hypothesis. 


Example 3. Let X ~ N (u, 0”). If both и and o? are unknown, Ө = {(и, 0”): — 
oo < и < оо, о? > 0}. The hypothesis Ho: и < цо, o? > 0, where uo is a known 
constant, is a composite null hypothesis. The alternative hypothesis is Hj: џи > шу, 
c? > 0, which is also composite. Similarly, the null hypothesis и = шо, 0? > O is 
composite. 

If o? — og is known, the hypothesis Ho: и = po is a simple hypothesis. 


Example 4. Let X1, X2,... , Xn be iid b(1, p) RVs. Some hypotheses of interest 
are p = i, p< 1, pz 1 ог, quite generally, р = ро, р < ро, р = po, where ро is 
а known number, 0 < ро < 1. 


The problem of testing of hypotheses тау Бе described as follows: Given the 
sample point х = (x1, x2, ... , Xn), find a decision rule (function) that will lead to 
a decision to reject or fail to reject the null hypothesis. In other words, partition the 
sample space into two disjoint sets C and C* such that if x € C, we reject Ho, and if 
X € C*, we fail to reject Ho. In the following we write "accept Ho" when we fail to 
reject Но. We emphasize that when the sample point x € С“ and we fail to reject Ho, 
it does not mean that Ho gets our stamp of approval. It simply means that the sample 
does not have enough evidence against Но. 
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Definition 3. Let X ~ Fg, Ө € ©. A subset C of Rn such that if x € C, then Ho 
is rejected (with probability 1) is called the critical region (set): 


C = {x € Rn: Но is rejected if x є C]. 
There are two types of errors that can be made if one uses such a procedure. One 


may reject Ho when in fact it is true, called a type I error, or accept Ho when it is 
false, called а type II error: 





True 
Ho Н! 
Ho Correct Туре II error 
Accept 
Н, | Typelerror Correct 





If С is the critical region of a rule, PgC, Ө € Qo, is a probability of type І error, 
and PgC*, Ө є Ө}, is a probability of type П error. Ideally, one would like to find a 
critical region for which both these probabilities are 0. This will be the case if we can 
find a subset $ С 7, such that PoS = 1 for every Ө є Gg and Рө5 = 0 for every 
0 € ©. Unfortunately, situations such as this do not arise in practice, although they 
are conceivable. For example, let X ~ C(1, 6) under Ho and X ~ P(@) under Hj. 
Usually, if a critical region is such that the probability of type I error is 0, it will be 
of the form “до not reject Ho" and the probability of type II error will then be 1. 

The procedure used in practice is to limit the probability of type I error to a pre- 
assigned level o (usually, 0.01 or 0.05) that is small and to minimize the probability 
of type П error. To restate our problem in terms of this requirement, let us formulate 
these notions. 


Definition 4. Every Borel-measurable mapping q of Ra — [0, 1] is known as a 
test function. 


Some simple examples of test functions are ф(х) = 1 for all x є Rn, ф(х) = 0 
for all x € Ry, or g(x) = а, 0 < æ < l;forall x є Rn. In fact, Definition 4 includes 
Definition 3 in the sense that whenever q is the indicator function of some Borel 
subset A of Rn, A is called the critical region (of the test q). 


Definition 5. The mapping ф is said to be a test of hypothesis Но: 0 € Oo 
against the alternatives Ні: 0 є Ө, with error probability o (also called level of 
significance or, simply, level) if 


(р ЕөФ(Х) <о огай 0 є Өр. 


We shall say, in short, that ф is a test for the problem (a, 8o, 81). 
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Let us write By(@) = Egq(X). Our objective, in practice, will be to seek a test o 
for a given a, 0 < a < 1, such that 


Q) sup £,(0) < a. 

eco 
The left-hand side of (2) is usually known as the size of the test ф. Condition (1) 
therefore restricts attention to tests whose size does not exceed a given level of sig- 
nificance a. 

The following interpretation may be given to all tests g satisfying B,(@) < o for 
all Ө є Go. To every x € Rn we assign a number ф(х), 0 < ф(х) < 1, which is the 
probability of rejecting Ho that X ~ fe, Ө € Өө, if x is observed. The restriction 
Py (0) < а for Ө € Op then says that if Но were true, ф rejects it with a probability 
< a. We will call such a test a randomized test function. If g(x) = ГА (х), ф will be 
called a nonrandomized test. If x € A, we reject Ho with probability 1; and if x ¢ A, 
this probability is 0. Needless to say, А є Bn. 

We next turn our attention to the type II error. 


Definition 6. Let ф be a test function for the problem (о, O9, 61). For every 
0 € Ө, define 


(3) Bo (8) = Еөф(Х) = Pelreject Ho}. 


As a function of Ө, 8, (0) is called the power function of the test o. For any 0 € Өү, 
B, (8) is called the power of against the alternative 0. 


In view of Definitions 5 and 6, the problem of testing of hypotheses may now be 
reformulated. Let X ~ fe, 0 € Ө € Ry, Ө = 09 + Ө. Also, е0 < a < 1 be 
given. Given a sample point x, find a test ф(х) such that 8,(0) < о for Ө € Oo, and 
Вг(0) is a maximum for Ө € ©}. 


Definition 7. Let ©, be the class of all tests for the problem (o, Өө, Өт). A test 
фо € Ф, is said to be a most powerful (MP) test against an alternative 0 € ©; if 


(4) Bon (0) > В,(0) for all g € Dy. 


If ©; contains only one point, this definition suffices. If, on the other hand, ©, 
contains at least two points, as will usually be the case, we will have an MP test 
corresponding to each Ө є ©). 


Definition 8. A test до € Ф, for the problem (о, Oo, Ө!) is said to be a uni- 
formly most powerful (UMP) test if 


(5) Boy (8) > 8,(0) forallg є Pa, uniformly in 0 є Ө. 


Thus, if Өс and ©; are both composite, the problem is to find a UMP test q for 
the problem (o, O9, Ө). We will see that UMP tests very frequently do not exist, 
and we will have to place further restrictions on the class of all tests, By. 
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Note that, if фі, фу are two tests and A is a real number, 0 < А < 1, then Agi + 
(1 — А)фә is also a test function, and it follows that the class of all test functions Dy 
is convex. 


Example 5. Let X1, X2, ... , Xn be iid N (p, 1) RVs, where u is unknown but it 
is known that и € Ө = (uo, u1}, по < ш. Let Ho: Xi ~ N(uo, D, Hi: Xj ~ 
N (1, 1). Both Ho and Н! are simple hypotheses. Intuitively, one would accept Ho if 
the sample mean X is "closer" to zo than to p1; that is, one would reject Ho if X > К, 
and accept Ho otherwise. The constant k is determined from the level requirements. 
Note that under Ho, X ~ A (uo, 1/n), and under Hi, X ~ М(д\, 1/n). Given 
0 <a < 1, we have 








ne X= k— 
ты» n= P| жы “| 


l//n — Mn 
= P{type I error} = a, 


so that k = u + za / /n. The test, therefore, is (Fig. 1) 


"ew Za 
1, if x > ро + —, 
ф(х) = vn 


0, otherwise. 


Here X is known as a test statistic, and the test ф is nonrandomized with critical 

region С = (x: X > uo + Za/./n}. Note that in this case the continuity of X (that is, 

the absolute continuity of the DF of X) allows us to achieve any size a, 0 < а < 1. 
The power of the test at шу is given by 


L Z 
En oX) = Pu, Ix > uod | 





X- 
- | lys Po о-на) 


= P{Z > zo — vn (иу — uo) 


ccept Но eject Hy 





Ho Hot Zan x 


Fig. 1. Rejection region of Но in Example 5. 
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where Z ~ А/(О, 1). In particular, E, p(X) > o since uj > ро. The probability of 
type II error is given by 


P (type П error} = 1 — E p(X) 
= P{Z < za — Уп (ил — no). 


Figure 2 gives a graph of the power function B,(j) of ф for и > 0 when uo = 0, 
and Hj: и > 0. 


Example 6. Let X4, X2, Xa, X4, Xs, be a sample from b(1, p), where p is un- 
known and 0 < p < 1. Consider the simple null hypothesis Ho: X; ~ b(1, 1), that 
is, under Ho, p = і. Then Hj: Xi ~ Ь(1, р), р Æ }. A reasonable procedure would 
be to compute the average number of 1’s, namely, X = уз X;/5, and to accept Ho 
if |X = l| < c, where c is to be determined. Let a = 0.10. Then we would like to 
choose c such that the size of our test is œ, that is, 


0.10 = Pp=1/2 | 





x- jl 4. 


or 


0.5 


0.05 |------------------------- 
1.5 








Ob -М---------------------------я5----5----- 


1.5 


Fig. 2. Power function of o in Example 5. 
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5 
5 
6 0.90 = P,- -5csXi--s5 
(6) “| cs i ex) 


2 5 
= Pp=1/2 } — кх Y xi 755 e}. 
1 


where k = 5c. Now YT Xi ~ b(5, 1) under Но, so that ће PMF of Ya Xi — 3 is 
given in the following table: 


5 5 5 5 5 
Xi Xj Pp Xi = i 
2- 2^7 Es ES 

0 —2.5 0.03125 

1 —1.5 0.15625 

2 —0.5 0.31250 

3 0.5 0.31250 

4 1.5 0.15625 

5 2.5 0.03125 


Note that we cannot choose апу К to satisfy (6) exactly. It is clear that we have to 
reject Но when k = +2.5, that is, when we observe У` X; = О ог 5. The resulting size 
if we use this test is о = 0.03125 + 0.03125 = 0.0625 < 0.10. A second procedure 
would be to reject Ho if k = +1.5 or 42.5 (Y^ X; = 0, 1, 4, 5), in which case the 
resulting size is о = 0.0625 +2(0.15625) = 0.375, which is considerably larger than 
0.10. If we insist on achieving a = 0.10, a third alternative is to randomize on the 
boundary. Instead of accepting or rejecting Ho with probability 1 when У X; = 1 or 
4, we reject Но with probability у where 


5 5 
0.10 = Pp=1/2 x Xi= oars] + y Pp=1/2 [rx -1 oa] . 
1 1 
Thus 


0.0375 
= ———- = 0.114 
у= 0315 "0 


A randomized test of size о = 0.10 is therefore given by 


5 
1 if Ух = 00г5, 
1 


5 
x)= 
99 0.14 if os = 10г4, 
1 


0 otherwise. 
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0.5 


0 0.5 1 15 


Fig. 3. Power function of g in Example 6. 


The power of this test is 


5 5 
Ере(®) = Pp [у i= es] +0.114P, > Х; = 1 oa] 
1 1 


where p x 1 апа can be computed for any value of р. Figure 3 gives a graph of 


Bep). 
We conclude this section with the following remarks. 


Remark 1. The problem of testing of hypotheses may be considered as a special 
case of the general decision problem described in Section 8.8. Let A = (ao, ay}, 
where ао represents the decision to accept Ho: 0 є Oo, and a, represents the deci- 
sion to reject Но. A decision function ô is a mapping of Rn into A. Let us introduce 
the following loss functions: 


1 if 0 € 69 
L4(0, = d L4(0, == 0 for all Ө, 
1(9, ay) төсө; ап 1(@, ag) ora 
and 
0 if 0 € Oo 
L2(0, = d L5(0, = 0 for all 0. 
2(0, ao) l if 0 € 6, an 2(0, a1) ora 


Then the minimization of EgL25(0, 5(X)) subject to EgL1(0,6(X)) < a is the 
hypothesis-testing problem discussed above. We have 
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EgL;(0, 5(X)) = Рө{ӧ(Х) = ao], 0€6, 
= Ре{ассері Ho | Hj true}, 


and 


EgL3(0, 5(X)) = Рө{8(Ж) = ai), 9 € Oo, 
= Ро{тејесі Ho | Ө є Oo true]. 


Remark 2. In Example 6 we saw that the size a chosen is often unattainable. 
The choice of a specific value of a is completely arbitrary and is determined by non- 
statistical considerations such as the possible consequences of rejecting Ho falsely 
and the economic and practical implications of the decision to reject Ho. An alterna- 
tive and somewhat subjective approach wherever possible is to report the P-value of 
the test statistic observed. This is the smallest level о at which the sample statistic 
observed is significant. In Example 6, let 5 = Fai Xi. If S = 0 is observed, then 
Pm (S = 0) = Po(S = 0) = 0.03125. By symmetry, if we reject Ho for 5 = 0, we 
should also do so for 5 = 5, so the probability of interest is P9(S = 0 or 5) = .0625, 
which is the P-value. If $ = 1 is observed and we decide to reject Ho, we would 
also do so for $ = 0 because 5 = 0 is more extreme than 5 = 1. By symmetry 
considerations, 


P-value = Po(S < 1 or S > 4) = 2(0.03125 + 0.15625) = 0.375. 


This discussion motivates Definition 9 below. Suppose that the appropriate critical 
region for testing Ho against H; is one-sided. That is, suppose that C is either of the 
form (T > сү} or (T < cz}, where Т is the test statistic. 


Definition 9. The probability of observing under Ho a sample outcome at least 
as extreme as the one observed is called the P-value. The smaller the P-value, the 
more extreme the outcome and the stronger the evidence against Ho. 


If о is given, we reject Ho if P < o and do not reject Ho if P > o. In the two- 
sided case when the critical region is of the form C = {|T (X)| > k}, the one-sided P- 
value is doubled to obtain the P-value. If the distribution of T is not symmetric, the 
P-value is not well defined in the two-sided case, although many authors recommend 
doubling the one-sided P-value. 


PROBLEMS 9.2 


1. A sample of size 1 is taken from a population distribution P(A). To test Ho: A = 
1 against Hi: А = 2, consider the nonrandomized test ф(х) = 1 if x > 3, and 
= Q if x < 3. Find the probabilities of type I and type II errors and the power 
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of the test against А = 2. If it is required to achieve a size equal to 0.05, how 
should one modify the test ф? 


. Let X1, X2,..., Xn be a sample from a population with finite mean y and finite 


variance o?. Suppose that jz is not known but с is known, and it is required to 
test и = ро against u = шу (ш > ро). Let n be sufficiently large so that the 
central limit theorem holds, and consider the test 


Р xis 1 if x >k, 
p E 2» 5 n = 0 if Xx <k, 
where x = n^! Уу хг. Find k such that the test has (approximately) size о. 
What is the power of this test at u = u4? If the probabilities of type I and type II 
errors are fixed at a and f, respectively, find the smallest sample size needed. 


3. In Problem 2, if o is not known, find k such that the test g has size o. 


4. Let X1, X2, ... , Xn be a sample from Л (и, 1). For testing и < uo against 
и > цо, consider the test function 

TS Za 

1 if x > +—, 

BIS 
ф(х, x2, ... Xn) = а 

0 if x < uo + —. 

pit 


7. 


Show that the power function of o is a nondecreasing function of и. What is the 
size of the test? 


. A sample of size 1 is taken from an exponential PDF with parameter Ө, that is, 


X ~ С(1, 0). To test Ho: Ө = 1 against Hı: 0 > 1, the test to be used is the 
nonrandomized test 


1 if x > 2, 


Ler |, if x <2. 


Find the size of the test. What is the power function? 


Let X1, X2, ... , Xn be a sample from Л/(0, 07). To test Ho: с = оо against 
Hı =o # оо, it is suggested that the test 


1 if x? > ао Dx? <a, 


X1,X2, ... , Xp) = i 
Ф(х1, x2 n) ( Tuo ee, 


be used. How will you find су and c? such that the size of ọ is a preassigned 
number o, 0 < o < 1? What is the power function of this test? 


An urn contains 10 marbles, of which M are white and 10 — M are black. To test 
that M — 5 against the alternative hypothesis that M — 6, one draws 3 marbles 
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from the urn without replacement. The null hypothesis is rejected if the sample 
contains 2 or 3 white marbles; otherwise, it is accepted. Find the size of the test 
and its power. 


9.3 NEYMAN-PEARSON LEMMA 


In this section we prove the fundamental lemma due to Neyman and Pearson [74], 
which gives a general method for finding a best (most powerful) test of a simple 
hypothesis against a simple alternative. Let ( fo, 0 є ©}, where © = (09,01), be 
a family of possible distributions of X. Also, fo represents the PDF of X if X is a 
continuous RV, and the PMF of X if X is of the discrete type. Let us write fo(x) = 
Jo (X) and fi(x) = fo, (x) for convenience. 


Theorem 1 (Neyman-Pearson Fundamental Lemma) 


(a) Any test р of the form 


1 if fix) > К fo), 
(1) ф(х) = iro if fi) =k fo(x), 

0 if fi(x) <k fo(x), 
for some k > О and 0 < у(х) x 1, is most powerful of its size for testing 
Ho: Ө = 6 against H1 : Ө = 04. If k = оо, the test 

1 if fo(x) = 0, 

2 = 
@) v(x) |, if fo(x) > 0, 


is most powerful of size 0 for testing Но against Hj. 
(b) Given a, 0 < о < 1, there exists a test of form (1) or (2) with у(х) = y (a 
constant) for which Eg, p(X) = о. 


Proof. Let ф be a test satisfying (1) and ф* be any test with Ea,g*(X) < 
Ee, (X). In the continuous case 


[io = ф(х) (х) — k fox] dx 


-( Í + [ ) оо — "OMA =k foo) dx. 
fi>kfo fi «kfo 


For any x є {Л (х) > kfo(x)}, ф(х) — ф*(х) = 1 — 9*(x) > 0, so that the integrand 
is > 0. For x є { ў (х) < kfoG0), ф(х) — ф*(х) = —o*(x) < 0, so that the integrand 
is again > O. It follows that 
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[iw — e* (ILA) — k /о(х)]4х 
= Eo, Y(X) — Eo, 9" (X) — k(Egy9 (X) — Eog” (X)) > 0, 
which implies that 
Ee, 9 (X) — Eo 9" (X) > k(Ea9(X) — Egg * (X)) > 0 


since Ego" (X) < Ego (X). 
If k = оо, any test ф* of size 0 must vanish on the set ( fo(x) > 0}. We have 


Eo, p(X) — Ea, g* (X) = f [1 фел) dx > 0. 
{ fo(x)=0} 


The proof for the discrete case requires the usual change of integral by a sum 
throughout. 

To prove (b) we need to restrict ourselves to the case where 0 < o < 1, since the 
MP size O test is given by (2). Let y (x) — y, and let us compute the size of a test of 
form (1). We have 


Eae (X) = Pa Ў ОХ)  kfo(X)] + y Pa, Ui (X) = kfo(X)) 
= 1— Ра {Л СХ) < kfo(X)} + у Pa Ufi(X) = kfo(X)]. 


Since Pa,( fo(X) = 0} = 0, we may rewrite Eg,q (X) as 





СЕ <tr [o] 
Given 0 < o < 1, we wish to find k and y such that Eg,g(X) = о, that is, 
ө mfi] Ro 50] 1-е 
Note that 

АФ) «i| 

PX) ~ 


is a DF so that it is a nondecreasing and right continuous function of k. If there exists 
a kg such that 


ЛО E 
JE e| = Шш 


we choose у = 0 апа k = Ko. Otherwise, there exists а kg such that 
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xv 





(5) Pa, [29 


PELO <1-«< Pe P «|: 


< Ko 
fo(X) 
that is, there is a jump at ko (see Fig. 1). In this case we choose К — ko and 


_ Pal fi OO /fo(X) < ko} – (0 — а) 


(6) 
Pay Ui СХ) СХ) = ko} 


Since y given by (6) satisfies (4), and O < y < 1, the proof is complete. 
Remark 1. It is possible to show (see Problem 6) that the test given by (1) or (2) 
is unique (except on а null set), that is, if ф is an MP test of size a of Ho against Ay, 


it must have form (1) or (2), except perhaps for a set A with Po (A) = Pa, (A) = 


Remark 2. Ап analysis of proof of part (a) of Theorem 1 shows that test (1) is 
MP even if f and fo are not necessarily densities. 


Theorem 2. If a sufficient statistic T exists for the family ( fo: 0 є Ө), Ө = 
(00, 01}, the Neyman- Pearson MP test is a function of Т. 


The proof of this result is left as an exercise. 


Remark 3. If the family ( fo: Ө € ©} admits a sufficient statistic, one can restrict 
attention to tests based on the sufficient statistic, that is, to tests that are functions of 
the sufficient statistic. If д is a test function and T is a sufficient statistic, E{p(X) | 
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T} is itself a test function, O < E{y(X) | T} < 1, and 

Ед{Е{Ф(Х) | T)) = Eoo (X), 
so that y and E{g | T) have the same power function. 


Example І. Let X be an RV with PMF under Ho and H; given by 


x 1 2 3 4 5 6 


fox) 001 001 001 O01 001 0.95 
fiz) 0.05 004 003 002 001 0.85 


Then A(x) = fi(x)/fo(x) is given by 


x ji 2 3 4 5 6 
Ах) 15 4 3 2 1 0.89 


If a 


= 0.03, for example, then Neyman—Pearson MP size 0.03 test rejects Ho if 
A(X) = 3, 


that is, if X < 3 and has power 
Р(Х < 3) = 0.05 + 0.04 + 0.03 = 0.12 
with P(type П error) = 1 — 0.12 = 0.88. 


Example 2. Let X ~ N(0, 1) under Ho and X ~ С(1, 0) under H1. To find an 
MP size o test of Ho against Hj, 


AW A/U + х2)] 
fox) — (a/J2z)e-?? 


| 
SVa 


Figure 2 gives a graph of A(x) and we note that А. has a maximum at x = 0 and 
two minima at x = +1. Note that A(0) = 0.7979 and A(+1) = 0.6578, so for 
k € (0.6578, 0.7989), A(x) = k intersects the graph at four points and the critical 
region is of the form |X| < kı or |X| > k2, where k; and kz are solutions of A(x) = k. 
For k = 0.7979, the critical region is of the form |X| > ko, where kg is the positive 
solution of e ^ 9/2 — 1+ к, so that ko © 1.59 with о = 0.1118. Fork < 0.6578, 
а = 1, and for k = 0.6578, the critical region is |X| > 1 with a = 0.3413. For the 
traditional level о = 0.05, the critical region is of the form |X| > 1.96. 


A(x) = 
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о а - МО) = 0.7979 


ee Oe ee een a М1) = 0.6578 





0 А 1 k x 


Fig. 2. Graph of A(x) = (2/z)'"[exp(x?/2)/(1 + x?)]. 


Example 3. Let Ху, X2,...,Xn be iid b(1, p) RVs, and let Ho: р = ро, 
Hi: p = pı, ру > po. The MP size о test of Ho against Hj is of the form 


Ух oss n— Sx; 
1, АФЖ = a LN >k, 
ру (1 = ро)" 59 
у, АЖ) =, 


0, А(Х) < k, 


ф(х1, X2, учеле Xn) = 


where k and y are determined from 


E pp X) =g. 


Ух В. n—}_ xi 
ee (=) (т=®) ‚ 
ро 1— po 


and since pı > po, A(x) is an increasing function of У ` x;. It follows that A(x) > k 
if and only if $^ x; > kı, where Ау is a constant. Thus the MP size o test is of the 
form 


Now 


1 if Sox >k, 
ф(х) = фу if 57x; =k, 
0 otherwise. 
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Also, kı and y are determined from 


a = Ep Y(X) = Pp, [5x - «| T y Popo [5x =u] 
l 1 
ү луг п—г п\ k кў 
= > (") а — ро) +r(f Joba - po^. 


r=ky+1 


Note that the MP size о test is independent of p; as long as ру > po; that is, it 
remains an MP size о test against any p > po and is therefore a UMP test of p = po 
against p > po. 

Tn particular, let n — 5, po — j, р = i and а = 0.05. Then the MP test is given 
by 


1, Ух > k, 
ф(х) = iy, Ух; =k, 
0, ух <k, 


where k апа у are determined from 


A (SV f Y 5\ AG 
Heec C) (5) * (2) (5) i 
It follows that k = 4 and y = 0.122. Thus the MP size о = 0.05 test is to reject 
= 4 in favor of p = 3 if $7, X; = 5 and reject р = 4 with probability 0.122 if 
b Xj — 4. 
It is simply a matter of reversing inequalities to see that the MP size o test of 
Ho: р = po against Hj: p = р (рі < po) is given by 


1 if 37x; <k, 
фк) = фу  if»x-k 
0 if Px; >k, 


where y and k are determined from Е р,ф(Х) = a. 
We note that T (X) = У X; is minimal sufficient for p, so that in view of Remark 
3, we could have considered tests based only on Т. Since T — b(n, p), 


(Ol = р”! t n-t 
йу аЛ. Xe A rA ыш (& (22) 


— fot т i |= 
0 ("Эта - oo t ро ро 


so that ап MP test is of the same form as above but the computation is somewhat 
simpler. 
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We remark that in both cases (ру > ро, ру < po) the MP test is quite intuitive. 
We would tend to accept the larger probability if a larger number of "successes" 
showed up, and the smaller probability if a smaller number of "successes" were 
Observed. See, however, Example 2. 


Example 4. Let X1, X2, ... , X, be iid N(u, c?) RVs экме both u and c? are 


unknown. We wish to test d щй hypothesis Ho: = цо, о = og against the 
alternative H1: и = рі, с = og. The fundamental lemma leads to the following 
MP test: 


"IEEE 
9? lo. Cif A(x) < 6, 


where 


(1/ooV/2z)" ехр{—[У (ч — ш1)2/2021} 


A(x) = A 
(1/oo4/27)" exp( У Gi — ио)? /2od1) 


and К is determined from E,,.5,9(X) = a. We have 


2 2 
Hı BO Lo Hi 
A(x) = exp Dp (5 — =) +п (& — 4) @ 


If u; > ро, then 


n 
AQ)» К ifandonlyif ^x » K^, 


i=! 


where k’ is determined from 


: У Xi -nmo „К nuo 
ГА i 
itus ener] E 


i=} 


giving k’ = Zay/n oo + nuo. The case шу < po is treated similarly. If o9 is known, 
the test determined above is independent of p; as long as ші > цо, and it follows 
that the test is UMP against Hi DAR о? = оф. If, however, 2i is not known, that 
is, the null hypothesis is a composite hypothesis Ну: и = Ho, a? > 0 to be tested 
against the alternatives Hy: Ш = і, о? > 0 (рі > Uo), ће MP test кт 
above depends оп ө?. Ж other ой ап = test against the alternative 41, og will 
not be MP against и, оў , where o2 Ф og. 
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PROBLEMS 9.3 


1. 


10. 


11. 


A sample of size 1 is taken from PDF 


fo(x) 0-2) ifO0<x «6, 
0 = 


otherwise. 


Find an MP test of Ho: Ө = Өр against Hi: Өү (Өү < 9). 


. Find the Neyman-Pearson size a test of Ho: Ө = 00 against H; : 0 = Өү (01 < 


80), based on a sample of size 1 from the PDF 
fe (x) = 20x +2(1 — 0)(1 — x), О<х<1, Өє[0,1]. 


Find the Neyman—Pearson size o test of Ho: В = 1 against Ну: В = В; (> 1), 
based on a sample of size 1 from 


Вх?-!, О<х<1, 
x; В) = . 
fei B 0, otherwise. 

Find an MP size о test of Ho: X ~ fo(x), where fo(x) = (Ол)-!/2е—*?/?, 
—оо < x < oo, against Hy: X ~ f(x), where fi(x) = 27'e#!, —со<х < 

oo, based on a sample of size 1. 


. For the PDF fo(x) = e~@~®, x > 0, find an MP size o test of Ө = 00 against 


Ө = Ө; (> 09), based on a sample of size n. 


. If 9* is ап MP size о test of Ho: X ~ fo(x) against Ну: X ^ f(x), show that it 


has to be either of form (1) or form (2) (except for a set of x that has probability 0 
under Но and Н}). 


. Let ф* be an MP size a (0 < а < 1) test of Ho against Hj, and let k(o) denote 


the value of k in (1). Show that if a; < a2, then k(ov2) < (01). 


. For the family of Neyman- Pearson tests, show that the larger the o, the smaller 


the B (= P{type П error). 


Let 1 — В be the power of an MP size о test, where 0 < а < 1. Show that 
a < 1 — B unless Pa, = Ps. 


Let о be a real number, 0 < o < 1, and ф* be an MP size a test of Ho against 
Hı. Also, let В = Ej, 9" (X) < 1. Show that 1 — ф* is an MP test for testing H) 
against Но at level 1 — В. 


Let X1, X2,... , X, be a random sample from the PDF 
ө р 
fe(x) = 5 #0 <0 <х < оо. 
X 


Find an MP test of 0 = бу against 0 = 0; (Æ Ө). 
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12. Let X be an observation in (0, 1). Find an MP size о test of Ho: X ^ f(x) = 4x 
О <x < 1, а= 4 — 4x if 3 <x < 1, against Н: X ~ f(x) = 1 if 
0 < x < 1. Find the power of your test. 


13. In each of the following cases of simple versus simple hypotheses Ho: X ~ fo, 
Hi: X ~ fi, draw a graph of the ratio A(x) = fi (x)/ fo(x) and find the form of 
the Neyman- Pearson test: 


(а) fo(x) = 1 exp(—|x + 1; fi) = $ exp(—|x — 1D. 
(b) fox) = lexp(-IxD: fie) = 1/[л(1 + х?)]. 
(с) fox) = A/A + A +x; fii) = (1/л)[1+ (1—х)?]71. 
14. Let X1, X2, ... , Xn be a random sample with common PDF 
fax) = ddr (-3) ; xeR, 050 
20 0 


Find a size о MP test for testing Ho : Ө = 09 versus Hi : 0 = 01 (> 69). 
15. Let X ~ fj, j = 0, 1, where 


м 


- 
> 


fo(x) 
AG) 


am we | N 
ок vl | Oo 
Ae wre | CA 


Aim te 
ae u 


(a) Find the form of the MP test of its size. 
(b) Find the size and the power of your test for various values of the cutoff point. 


(c) Consider now a random sample of size n from fo under Ho or fı under Н]. 
Find the form of the MP test of its size. 
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In this section we consider the problem of testing one-sided hypotheses on a single 
real-valued parameter. Let ( fo, 0 є Ө} be a family of PDFs (PMFs), Ө C R, and 
suppose that we wish to test Ho: Ө < бо against the alternatives Hj: 0 > 69 or 
its dual, Hj: 0 > 69, against Hj: Ө < 69. In general, it is not possible to find a 
UMP test for this problem. The MP test of Ho: Ө < 00, say, against the alternative 
Ө = Өү (> 69) depends on 6; and cannot be UMP. Here we consider a special class 
of distributions that is large enough to include the one-parameter exponential family, 
for which a UMP test of a one-sided hypothesis exists. 


Definition 1. Let (/5,0 є Ө} be a family of PDFs (PMFs), Ө С R. We say that 
{ fo} has a monotone likelihood ratio (MLR) in statistic T (x) if for 04 < 62, whenever 
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fo, . fo, are distinct, the ratio fo, (x)/ fo, (x) is a nondecreasing function of T(x) for 
the set of values x for which at least one of fo, and fo, is > 0. 


It is also possible to define families of densities with nonincreasing MLR in T (x), 
but such families can be treated by symmetry. 


Example 1. Let Xi, X2,... , Xn ~ U[0,0], 0 > 0. Thejoint PDF of X1,... , Xn 
is 
1 


Јо) = (0n 
0, otherwise. 


0 < maxx; x 6, 


Let 62 > 0, and consider the ratio 
fo) _ (1/07) Imax x; <6) 
fe, (х) (1/01) Дах x; «011 


E E y Timax x; <Ө;] 
0 Imax x; <01] 





Let 


1, max х; € [0, Ө], 


R(x) = imas x; <60)/ тах s 01] = | max x; € [01, 62] 
, i , F 


Define R(x) = oo if max x; > Ө». It follows that fo, /fo, is a nondecreasing func- 
tion of тах! <; <п xj, and the family of uniform densities оп [0, Ө] has an MLR in 
maxı <i <n Xi . 

Theorem 1. The one-parameter exponential family 
(1) fe (x) = exp[Q(@)T (x) + S(x) + 0(0)], 
where Q(@) is nondecreasing, has an MLR in 7 (x). 


The proof is left as an exercise. 


Remark 1. Тһе nondecreasingness of Q(0) сап be obtained by a reparametriza- 
tion, putting 9 = Q(0), if necessary. 


Theorem 1 includes normal, binomial, Poisson, gamma (one parameter fixed), 
beta (one parameter fixed), and so on. In Example 1 we have already seen that 
U[O, Ө], which is not an exponential family, has an MLR. 
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Example 2. Let X ~ C(1, 0). Then 


f(x) l*G-0) 
faa) 1+0 – 02)? 


and we see that C(1, Ө) does not have ап MLR. 





1 as x — +оо, 


Theorem 2. Let X ~ fo, 0 € ©, where ( fo] has an MLR in 7 (x). For testing 
Ho: 0 < 6 against Hj: Ө > 09, 09 € Ө, any test of the form 


1 if T(x) > to, 
(2) Ф) = фу if TK) = 00, 
1 if T(x) < to, 


has a nondecreasing power function and is UMP of its size Eg,g(X) = a (provided 
that the size is not 0). 

Moreover, for every 0 < a < 1 and every 6o € Ө, there exists a fo, —oo < fg < 
oo, and 0 < y < 1 such that the test described in (2) is the UMP size о test of Ho 
against Hj. 


Proof. Let 01,0; € ©, Ө, < 62. By the fundamental lemma, any test of the form 


l, A(x) >k, 
(3) ф(х) = { у(х), A(x) =k, 
0, A(x) < К, 


where A(x) = fo,(x)/fo, (X), is MP of its size for testing Ө = 0, against Ө = 65, 
provided that 0 < k < oo; and К = oo, the test 
1 if fo, (x) = 0, 
(4) ф(х) = : fe 
0 if fo, (X) > 0, 


is MP of size 0. Since fg has an MLR in T, it follows that any test of form (2) is also 
of form (3), provided that Eg, 9(X) > 0, that is, provided that its size is > 0. The 
trivial test g'(x) = o has size a and power a, so that the power of any test (2) is at 
least œ, that is, 


Еө,ф(Х) > Epp (X) = а = Eo, p(X). 


It follows that if Өү < 02 and Eg, p(X) > 0, then Ea p(X) < Eo, p(X), as asserted. 

Let 04 = ĝo and 62 > 060, as above. We know that (2) is an MP test of its size 
Ев„ф(Х) for testing Ө = 09 against Ө = 62 (05 > 69), provided that Ege (X) > 0. 
Since the power function of 9 is nondecreasing, 


(5) E(X) < Ea (X) = a for all Ө < бу. 
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Since, however, 9 does not depend on 62 (it depends only on constants k and у), it 
follows that ф is the UMP size оо test for testing Ө = бо against Ө > 69. Thus is 
UMP among the class of tests g” for which 


(6) Eye" (X) < Egg (X) = ao. 


Now the class of tests satisfying (5) is contained in the class of tests satisfying (6) 
[there are more restrictions in (5)]. It follows that o, which is UMP in the larger class 
satisfying (6), must also be UMP in the smaller class satisfying (5). Thus, provided 
that ag > 0, ф is the UMP size ag test for 9 < 69 against 0 > бө. 

We ask the reader to complete the proof of the final part of the theorem, using the 
fundamental lemma. 


Remark 2. By interchanging inequalities throughout in Theorem 2, we see that 
this theorem also provides a solution of the dual problem Hj: Ө > 69 against 
Hi: Ө < 69. 


Example 3. Let X have the hypergeometric PMF 


nes) 
риса. Uees0 d mcs M: 


"m 


Рмы{Х=х} M+1N—-M—n+x 
Py{X =x} М-М M+l-x 


Since 
3 


we see that {Pm} has ап MLR in x(Py,/Pmy,, where M2 > My, is just a product 
of such ratios). It follows that there exists a UMP test of Ho: M < Мо against 
Hi: M > Мо, which rejects Ho when X is too large; that is, the UMP size a test is 
given by 


1, xk, 
ф(х) = фу, x = Ё, 
0, x <, 


where (integer) К апа у are determined from 
Emo¢(X) = a. 


For the one-parameter exponential family, UMP tests also exist for some two- 
sided hypotheses of the form 


(7) Ao: 9 x0, or 0 —60;5(01 < Ө). 


We state the following result without proof. 
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Theorem 3. For the one-parameter exponential family (1), there exists a UMP 
test of the hypothesis Но: 0 < Өү or @ > 05 (Өү < 05) against H1: 0; < Ө < 05 that 
is of the form 


1 if сү < T(x) < сә, 
(8) ф(х) = {у if T(x) = ci, і= 1,2 (cy < с2), 
0 if T(x) < сог > сә, 


where the c’s and the y’s аге given by 
(9) Eo, (X) = Ee, p(X) = о. 
See Lehmann [63, pp. 101—103] for proof. 


Example 4. Let X1, X2, ... , Xn be iid N (и, 1) RVs. To test Ho: и < ро or 
> pı (ил uo) against Hj: uo < u < u, the UMP test is given by 


1 if сү < Ух < с, 
ф(х) = iyi if Ух = cy orc, 
0 if 9x; <cyor > с), 


where we determine c;, c? from 
а = Р.с < ух; < c3] = Р, {с < ух; < сэ} 


and уу = у; = 0. Thus 


ap az 20 < Хі пио | c2—nnuo 
Jn n 
eR È Xi -np 2 em 
Jn т Jn 
=p |e nu „лн 
A X, 
cy — np c2 — пуд 
= P | — = < Z < ——}, 
| Ju o T UR | 


where Z is (0, 1). Given a, n, uo, and u1, we can solve for c, and c2 from the 
simultaneous equations 


Ga) ag) 
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Qa) a) 


where Ф is the DF of Z. 


Remark 3. We caution the reader that UMP tests for testing Ho: 01 < 0 < 02 
and Hj: Ө = 9% for the one-parameter exponential family do not exist. An example 
will suffice. 


Example 5. Let X1, X2,... , X, be a sample from N (0, o?). Since the family of 
joint PDFs of X = (X4,..., Xn) has an MLR in T(X) = У x?, it follows that 
UMP tests exist for one-sided hypotheses o > оо апіс < оо. 

Consider now the null hypotheses Ho: с = op against the alternative H1: с # 
со. We will show that a UMP test of Ho does not exist. For testing o = oo against 
с > oo, a test of the form 


2 
1, Ух >a, 
0, otherwise, 


ф(х) = | 
is UMP, and for testing о = oo against с < оо, a test of the form 


$3 < c3, 
0, otherwise, 


1, 
q2(x) = | 





0 1 2 3 


Fig. 1. Power functions of chi-square tests of Ho: o = op against Hj. 
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is UMP. If the size is chosen as о, then cy = оў Жы and c) = o Xia a: Clearly, 
neither pı пог 2 is UMP for Ho against H4: o з oo. The power of any test of Ho 
for values с > oo cannot exceed that of фі, and for values of o. < og it cannot 
exceed the power of test v». Hence no test of Ho can be UMP (see Fig. 1). 


PROBLEMS 9.4 


1. For the following families of PMFs (PDFs) fo(x), 9 € © C R, find a UMP size 
a test of Но: Ө < 09 against H; : 0 > Өр, based on a sample of n observations: 


(a) fo(x) 20*(1—0)17,x = 0, 1;0 <0 « I. 

(b) fax) = (1/27) expl - (x — 0)2/2], —oo < x < oo, —oo < 0 < oo. 
(c) fo(x) = e^ *(0* /x!), x = 0, 1, 2,...;0 > 0. 

(d) falx) = (1/0)e7*/9, x > 0,0 > 0. 

(е) fo(x) = [1/ T (0)]x?7le-*, x > 0,0 > 0. 

(f р(х) 20x?7,0 <х «1,0 > 0. 


2. Let X1, X2,..., Xn be a sample of size п from ће PMF 


1 
Руб) = ту› x=1,2,...,N;N e€{1,2,...}. 


(a) Show that the test 


( ) 1 if max(x1, x2, ... , Xn) > No, 

Хр, X2,... An) = Р 

Veris " а if тпах(х,х2,...,хһ) < №, 
is UMP size о for testing Но: N < № against Hı: N > No. 

(b) Show that 


1 if max(xi, x2, ... , Xn) > No or 
g(xi, X2, ... Xn) = max(xi, x2, ... , x4) x ad" No, 
0 otherwise, 


is a UMP size о test of Но: N = No against Ну: N # No. 
3. Let X1, X2, ... , X, bea sample of size n from U (0, 0), Ө > 0. Show that the 
test 


( ) 1 if max(x1,... , Xn) > 00, 
1 x 3X2, (X = . 
nv * a if max(xi, x2, ... , Xn) < бр, 


is UMP size о for testing Ho: Ө < 00 against H1: Ө > 69 and that the test 
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1 if max(xi,... , X4) > 6o or 
9201, X2, ... Xn) = max(xj, x2, ... , Xn) < Ө001/", 
0 otherwise, 


is UMP size о for Hj: Ө = 69 against Hj: Ө # 69. 
4. Does the Laplace family of PDFs 
fo(x) = lexp(-Ix -0), ^ —oo «x «oo GER, 


possess an MLR? 


5. Let X have logistic distribution with the PDF 
fo(x) 2e7790-pe779)7, XER. 


Does ( fo) belong to the exponential family? Does { fo] have MLR? 


6. (a) Let fo be the PDF of a Л/(Ө, Ө) RV. Does ( fa) have MLR? 
(b) Do the same as in part (a) if X ~ N (0,02). 


9.5 UNBIASED AND INVARIANT TESTS 
We have seen that if we restrict ourselves to the class Ф of all size o tests, there 
do not exist UMP tests for many important hypotheses. This suggests that we reduce 


the class of tests under consideration by imposing certain restrictions. 


Definition 1. A size o test y of Ho: 0 € Oo against the alternatives Hı: Ө € Өү 
is said to be unbiased if 


(1) Egq(X) > о for all 0 є 8. 


It follows that a test ф is unbiased if and only if its power function Bo (0) satisfies 


Q) Bo(0)<a гө € Ө 
and 
(3) Bo (8) > а for0 € Ө. 


This seems to be a reasonable requirement to place on a test. An unbiased test rejects 
a false Но more often than a true Но. 


Definition 2. Let U, be the class of all unbiased size a tests of Но. If there exists 


atest p є Ua that has maximum power at each 0 є ©), we call o a UMP unbiased 
size o test. 


480 NEYMAN-PEARSON THEORY OF TESTING OF HYPOTHESES 


Clearly, Uy C ®q. If a UMP test exists in Dg, it is UMP in Ux. This follows 
by comparing the power of the UMP test with that of the trivial test g(x) = o. It is 
convenient to introduce another class of tests. 


Definition 3. A test ф is said to be a-similar on a subset O* of © if 
(4) В,(0) = Eop(X) =a = ford c G*. 


A test is said to be similar on a set O* C Ө if it is o-similar on ©* for some a, 
O<a <i. 


It is clear that there exists at least one similar test on every ©*, namely, ф(х) = о, 
O<e <i. 


Theorem 1. Let By (0) be continuous іп 6 for any g. If ф is an unbiased size a test 
of Ho: Ө € Oo against Hy: 0 € Ө, it is a-similar on the boundary A = Oo N ©}. 
(Here A is the closure of set A.) 


Proof. Let@ є A. Then there exists a sequence {0,}, 0, є Qo, such that 6, — 0. 
Since (Ө) is continuous, y(n) — В,(0); and since B,(0,) < а for 6, € Oo, 
В,(0) < о. Similarly, there exists a sequence {@,}, 0; € O1, such that &,(0,) > о 
(9 is unbiased) and 6) — Ө. Thus В,(0,) — (0), and it follows that &,(0) > a. 
Hence B,(8) = о for є A, and ф is a-similar on A. 


Remark 1. Thus if бш(Ө) is continuous in Ө for any o, an unbiased size о test of 
Но against H is also a-similar for the PDFs (PMFs) of A, that is, for ( fo, 0 € A}. If 
we can find an MP similar test of Ho: 0 € A against Hj, and if this test is unbiased 
size a, then necessarily it is MP in the smaller class. 


Definition 4. A test g that is UMP among all a-similar tests on the boundary 
А = Ө N O; is said to be a UMP a-similar test. 


It is frequently easier to find a UMP o--similar test. Moreover, tests that are UMP 
similar on the boundary are often UMP unbiased. 


Theorem 2. Let the power function of every test р of Ho: 0 € Og against 
Hy: Ө € Өңү be continuous in Ө. Then a UMP a-similar test is UMP unbiased, 
provided that its size is о for testing Ho against Hy. 


Proof. Let фо be UMP a-similar. Then Egqo(X) < a for 0 є Oo. Comparing 
its power with that of the trivial similar test ф(х) = о, we see that фу is unbiased 
also. By the continuity of &, (0), we see that the class of all unbiased size o tests is а 
subclass of the class of all -similar tests. It follows that фо is a UMP unbiased size 
о test. 
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Remark 2. The continuity of power function Д„ (0) is not always easy to check, 
but sufficient conditions may be found in most advanced calculus texts (see, for ex- 
ample, Widder [116, p. 356]). If the family of the PDF (PMF) fa is an exponential 
family, a proof is given in Lehmann [63, p. 59]. 


Example I. Let Х\, X2,... , X, be a sample from Л (u, 1). We wish to test 
Ho: и < О against Hj: u > 0. Since the family of densities has an MLR in y» Xi, 
we can use Theorem 9.4.2 to conclude that a UMP test rejects Ho if У Xi > c. 
This test is also UMP unbiased. Nevertheless, we use this example to illustrate the 
concepts introduced above. 

Here 99 = {u < 0}, ©; = {u > 0}, and A = ®o N Өң = {u = 0). Since 
T(X) = ii X; is sufficient, we focus attention on tests based on T alone. Note 
that T ~ A (ny, n), which is one-parameter exponential. Thus the power function 
of any test 9 based on T is continuous in y. It follows that any unbiased size a test 
of Ho has the property 8,(0) = о of similarity over A. In order to use Theorem 2, 
we find a UMP test of Hy : и € A against Hj. Let ші > 0. By the fundamental 
lemma, an MP test of и = 0 against и = p1 > 0 is given by 


2 e" 2 
if exp Е === d > к, 


ptt) = 2n 
0 otherwise, 
г> 
70 ift <k, 


where k is determined from 
a = РТ > к) = Р |2 м | 
ss = > =. 
0 a 


Thus k = ./n za. Since ф is independent of ш as long as ш > 0, we see that the 
test 


0, otherwise, 


e(t) = | еса 


is UMP a-similar. We need only check that ¢ is of the right size for testing Но against 
Hı. We have for u < 0, 


Е„кФ(Т) = Pi {T > Мп za) 
= T пи _ 
=P WA > га Jiu) 


< P{Z > ш}, 
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since —./n u > 0. Here Z is N (0, 1). It follows that 

ЕнФ(Т) <a foru < 0, 
hence ф is UMP unbiased. 


Theorem 2 can be used only if it is possible to find a UMP a-similar test. Unfor- 
tunately, this requires heavy use of conditional expectation, and we will not pursue 
the subject any further. We refer to Lehmann [63, Chaps. 4 and 5], and Ferguson [25, 
рр. 224—233], for further details. 

Yet another reduction is obtained if we apply the principle of invariance to 
hypothesis-testing problems. We recall that a class of distributions is invariant under 
a group of transformations G if for every g € С and every Ө є © there exists a 
unique 0' є © such that g(X) has distribution Ре’, whenever X ~ Pg. We rewrite 
0' = 20. 

In a hypothesis-testing problem we need to reformulate the principle of invari- 
ance. First, we need to ensure that under transformations G, not only does P = 
(Pg: Ө є Ө} remain invariant but also the problem of testing Ho: Ө € G9 against 
Hj: Ө € €, remains invariant. Second, since the problem has not changed by appli- 
cation of G, the decision also must not change. 


Definition 5. A group © of transformations on the space of values of X leaves a 
hypothesis-testing problem invariant if G leaves both (Pg: Ө € Oo] and (Pg: 0 є 
O1) invariant. 

Definition 6. We say that o is invariant under G if 

ф(в(®)) = ф(х) for all x and all g € G. 

Definition 7. Let G be a group of transformations on the space of values of the 
RV X. We say that a statistic T (x) is maximal invariant under © if (a) T is invariant; 
(b) T is maximal, that is, T (x1) = T (x2) => x1 = g(x2) for some g є б. 

Example 2. Let x = (x1, x2, ... , Xn), and G be the group of translations 

gc(X) = Gu t 6, ... , Xn +С), -0 «c «oo. 
Here the space of values of X is Ry. Consider the statistic 
T (X) = (Xn — Xi, ..- , Xn — Xn-1)- 


Clearly, 


T (&c(X)) = Gt, — Х1,... , Xn — Xn-1) = T (X). 
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If T(x) = T(x’), then x, — xj; = x, х, i = 1,2,...,n — 1, and we have 
ХіХ) = хрх) = c(i = 1,2,... ‚п — 1); that is, gc (X) = (xj +¢,... x bc) =x 
and T is maximal invariant. 

Next consider the group of scale changes 


gc(X) = (сх1,... ,CXn), с> 0. 
Then 
0 if all х; = 0, js 
T(x) = (3... А z) if at least one x; #0, z= (529) | 
z z 


is maximal invariant; for 

T(g¢(x)) = T (ex1, ... ex) = T(x), 
and if T(x) = T(x’), then either T(x) = T(x’) = 0, in which case x; = x; = 0, or 
T(x) = T(x’) Æ 0, in which case x;/z = x;/z', implying that х; = (z'/z)x; = cxi, 


and Т is maximal. 
Finally, if we consider the group of translation and scale changes, 


g(x) = (ax, +b,... , ax, +b), а> 0, —со<Ь<со, 
a maximal invariant is 
0 if B = 0, 
T (х) = (25+ X2—X mL) p 
US uS t5 5 if 0, 
B B В did 


where X — 5^! 23 x; and B — п! Уо; = xy. 








Definition 8. Let 7, denote the class of all invariant size о tests of Ho: 0 € Ө 
against Н: Ө є Ө]. If there exists a UMP member in /,, we call the test a UMP 
invariant test of Ho against Hj. 


The search for UMP invariant tests is greatly facilitated by use of the following 
result. 


Theorem 3. Let T (x) be maximal invariant with respect to C. Then ф is invariant 
under G if and only if is a function of T. 


Proof. Let ф be invariant. We have to show that T(xi) = T (x?) > e(xi) = 
9(x2). If T(x1) = T(x), there isa g є G such that xy = g(x2), so that ф(х) = 
€ (g(x2)) = e(x2). 
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Conversely, if ф is a function of Т, p(x) = h[T (x)], then 
ф(в(х)) = A[T(g(x))] = h[T Q9] = ф(х), 
and ф is invariant. 


Remark 3. The use of Theorem 3 is obvious. If a hypothesis-testing problem is 
invariant under a group C, the principle of invariance restricts attention to invariant 
tests. According to Theorem 3, it suffices to restrict attention to test functions that 
are functions of maximal invariant Т. 


Example 3. Let X1, X2,..., X, be a sample from A (1,02), where both и 
and о? are unknown. We wish to test Ho: с > ор, —00 < и < оо, against 
Hi: © < ор, —oo < u < co. The family (A (и, с2)} remains invariant under 
translations x = xj +c, —oo < c < co. Moreover, since var(X + c) = var(X), the 
hypothesis-testing problem remains invariant under the > group of translations; that is, 
both {N (u, 02): o? > og} and {N (u, 02): о? < ad] remain invariant. The joint 
sufficient statistic is (X, Xq«i- Х)2), which is transformed to (X +c, УХХ; Xy) 
under translations. A maximal invariant is Y (X; — X)?. It follows that the class of 
invariant tests consists of tests that are functions of У(Х; — — Xy. 

Now Y (X; — X) /a? ~ x?(n — 1), so that the PDF of Z = УХ; — X)? is given 
by 


ост 


(n—3)/2 ,—z/2o? 
Ta-DApedwat 65 520. 


faoz) = 


The family of densities { f,2: c? > 0) has an MLR in z, and it follows that a UMP 
test is to reject Ho: o? > of if z < k, that is, а UMP invariant test is given by 


1 if Y; —x? <k, 
9 (x) = ; X 2 
0 if Ух; X) > К, 
where k is determined from the size restriction 
a (Xi XY. k 
ens [Dos cna] ee ROLE „А, 


that is, 
2-2 
k= 9% Xn—1,1—a* 


Example 4. Let X have PDF у; (ху – Ө,..., Xn — 9) under Н; (i = 0, 1), —oo < 
Ө < co. Let © be the group of translations 


gc(X) = (х +c, ..., Xn + c), —oo«c«oo п> 2. 
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Clearly, g induces g on ©, where 20 = Ө + с. The hypothesis-testing prob- 
lem remains invariant under С. A maximal invariant under С is T(X) = (Ху — 
Хһ.....Хл-1 — Xn) = (Т, To, ... , Т1). The class of invariant tests coincides 
with the class of tests that are functions of Т. The PDF of T under Н; is independent 
of 6 and is given by x fiti z, -.. tu iz, 2) dz. The problem is thus reduced to 
testing a simple hypothesis against a simple alternative. By the fundamental lemma 
the MP test 


1 if A(t) > c, 
ti, t2,..-,t-1) = f 
gl, з "-1) | if A(t) « c, 
where t = (t1, t2,... , t3 1) and 


oo 
f Ritz... tg +z, z)dz 
—00 


(a == 
/ folti t z,... thi zz) dz 
—o00 


is UMP invariant. 
A particular case of Example 4 will be, for instance, to test Ho: X ~ (0, 1) 
against Ну: X ~ C(1,0),0 є R (see Problem 1). 


Example 5. Suppose that (X, Y) has joint PDF 
fo(x, у) = àp exp(—Ax — uy), х> 0, y>0, 


and = 0 elsewhere, where Ө = (А, и), А > 0, и > 0. Consider scale group G = 
НО, c), с > 0} which leaves { fg} invariant. Suppose that we wish to test Ho: u > А 
against Hı: и < А. It is easy to see that СӨ = Qo, so that С leaves (о, 89, Өң) 
invariant and T — Y/X is maximal invariant. The PDF of T is given by 


Au 


Qa uot t>0, =Ofort <0. 


fg @) = 


The family { A) has MLR in Т, and hence a UMP invariant test of Ho is of the form 


1, t > c(a), 
Ф) = ју, t=c(a), 
0, t « c(a), 


where 


oo 1 1-а 
a= —— dt > c(a) = ——. 
LN (1 +1)? а 
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PROBLEMS 9.5 


1. To test Му: X ~ (0,1), against Hj: X ~ С(1, 0), а sample of size 2 is 
available on X. Find a UMP invariant test of Ho against Hj. 

2. Let X1, X2, ... , X, be a sample from P(A). Find a UMP unbiased size o test 
for the null hypothesis Ho: A < Ао against alternatives A > Ao by the methods 
of this section. 


3. Let X ~ N B(1; 0). By the methods of this section, find a UMP unbiased size a 
test of Но: Ө > 09 against Hj: Ө < 69. 
4. Let X;, X2, ... , X, üd N (p, 0?) RVs. Consider the problem of testing Ho: и < 
0 against Hı: u > 0. 
(a) It suffices to restrict attention to sufficient statistic (U, V), where U = X 
and V = 52. Show that the problem of testing Ho is invariant under С = 
(а, 1}, a € R} and a maximal invariant is T = U/A/V. 
(b) Show that the distribution of 7 has MLR, and a UMP invariant test rejects 
Ho when T > c. 


5. Let X1, X2, ... , X, be iid RVs and let Но be that X; ~ (0, 1) and Hj be 
that the common PDF is fo(x) — } exp(—|x — 0|). Find the form of the UMP 
invariant test of Но against Hi. 

6. Let X1, X2,..., X, be iid RVs and suppose that Ho: X; ^ (0,1) and 
Ay: Xi ~ fix) = expC-1xD/2. 

(a) Show that the problem of testing Но against Hj is invariant under scale 
changes 2.(х) = cx, c > Oanda maximal invariant is T (X) = (X1/Xn,..., 
Xn-1/ X5). 

(b) Show that the MP invariant test reject Но when 


ү1+ Уд Y? 
<k 


+E Y c 
where Y; = X;/Xn, j =1,2,... ‚п — 1, or equivalently, when 
1/2 
п 2 
үн e 


9.6 LOCALLY MOST POWERFUL TESTS 


In the preceding section we argued that whenever a UMP test does not exist, we 
restrict the class of tests under consideration and then find a UMP test in the subclass. 
Yet another approach when no UMP test exists is to restrict the parameter set to 
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a subset of ©,. In most problems, the parameter values that are close to the null 
hypothesis are the hardest to detect. Tests that have good power properties for “local 
alternatives" may also retain good power properties for “nonlocal” alternatives. 


Definition 1. Let © C R. Then a test qo with power function By, (0) = Еофо(Х) 
is said to be a locally most powerful (LMP) test of Но: Ө < Өр against H1: 0 > 6 
if there exists a A > 0 such that for any other test o with 


(1) Bo (60) = Во, (Po) = f ф(х) fo (X) dx, 
(2) Bo (0) > В,(0) for every 0 € (00,6 + А]. 


We assume that the tests under consideration have continuously differentiable 
power function at Ө = 69 and the derivative may be taken under the integral sign. In 
that case, an LMP test maximizes 


д 250 _ 9 
(3) aset], = A0, = f ез teen], ax 


subject to the size constraint (1). A slight extension of the Neyman-Pearson lemma 
(Remark 9.3.2) implies that a test satisfying (1) and given by 








„„ 8 
1 if эв /* 9 ^ > kfa (X), 
д 
(4) фо(%®) = 1Y ify foa) = k fay (x), 
69 
д 
0 if 5 fe 0 К < k fog(x) 





will maximize & (09). It is possible that a test that maximizes B, (00) is not LMP, but 
if the test maximizes В'(00) and is unique, it must be an LMP test (see Kallenberg et 
al. [47, p. 290] and Lehmann [63, p. 528]). 

Note that for x for which Ја, (х) Æ 0, we can write 


д 
=— fe 
дө ө 8 
Cac 99 РЁ?! 


and we сап rewrite 
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9 
1 if — log fa(xX)) >k, 
д0 5 
.. д 
(5) фо(х) = ty if 38 log fo(x)| =k, 
% 
.. @ 
0 if — log fo(x)| <k. 
90 % 
Example 1. Let X1, X2,... , Xn be iid with common normal PDF with mean д 


and variance o. If one of these parameters is unknown while the other is known, 
the family of PDFs has MLR, and UMP tests exist for one-sided hypotheses for the 
unknown parameter. Let us derive the LMP test in each case. 

First consider the case when o? is known, say а? = land Ho: и € 0, Hy: и> 0. 
An easy computation shows that an LMP test is of the form 


с rS 
T Tg. | ae ek. 


which, of course, is the form of the UMP test obtained in Problem 9.4.1 by an appli- 
cation of Theorem 9.4.2. 

Next consider the case when џи is known, say и = 0 and Ho: o < oo, Ні: o > 
oo. Using (5), we see that an LMP test is of the form 


1 if 97 x2 >k, 
Ф) = à 21 2 
0 if Jx <А, 
which coincides with the UMP test. 
In each case the power function is differentiable and the derivatives may be taken 
inside the integral sign because the PDF is a one-parameter exponential type PDF. 


Example 2. Let Xi, X2, ... , X, be iid RVs with common PDF 


1 1 


--— R, 
niao? US 


fox) = 


and consider the problem of testing Ho: 0 < О against Hj: 0 > 0. 

In this case ( fo) does not have MLR. A direct computation using the Neyman- 
Pearson lemma shows that an MP test of Ө = 0 against Ө = 01, Өү > 0, depends on 
Өү and hence cannot be MP for testing Ө = 0 against Ө = 62, 0? # Өү. Hence a UMP 
test of Ho against Ну does not exist. An LMP test of Ho against Ну is of the form 


n 


ps SH et 


1 i 
pox) = E l txi 
0 otherwise, 
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where k is chosen so that the size of фо is о. For small n it is hard to compute k but for 
large n it is easy to compute k using the central limit theorem. Indeed, X;/(1 + x2) 
are iid RVs with mean 0 and finite variance (= 2), so that k = zo 4/n/2 will give an 
(approximate) level о test for large n. 

The test фу is good at detecting small departures from Ө < 0, but it is quite 
unsatisfactory in detecting values of 0 away from 0. In fact, for a < І, Ba, (0) > 0 
аз Ө — oo. 

This procedure for finding locally best tests has applications in nonparametric 
statistics. We refer the reader to Randles and Wolfe [83, Sec. 9.1] for details. 


PROBLEMS 9.6 


1. Let X1, X2, ... , Xn be iid C(1, Ө) RVs. Show that Eo(1-- X2) * = (1/1) B(k+ 
1, 4). Hence or otherwise, show that 


a x? ВР Xi \ 1 
паж х2 | Vx) s 


2. Let X1, X2, ... , Xn be a random sample from the logistic PDF 





1 ех —@ 


fo) = 3 сох —8)) are 


Show that the LMP test of Ho: Ө = 0 against Hj: Ө > 0 rejects Ho if 
р tanh(x;/2) > k. 
3. Let X1, X2,..., X, be iid RVs with the common Laplace PDF 
fo(x) = 4 exp(—|x — 0D. 


For n > 2, show that a UMP size a (0 < œ < 1) test of Ho: Ө < 0 against 
Н: Ө > О does not exist. Find the form of the LMP test. 


СНАРТЕК 10 


Some Further Results of 
Hypothesis Testing 


10.4 INTRODUCTION 


In this chapter we study some commonly used procedures in the theory of testing 
of hypotheses. In Section 10.2 we describe the classical procedure for constructing 
tests based on likelihood ratios. This method is sufficiently general to apply to multi- 
parameter problems and is especially useful in the presence of nuisance parameters. 
These are unknown parameters in the model which are of no inferential interest. Most 
of the normal theory tests described in Sections 10.3 to 10.5 and those in Chapter 12 
can be derived by using methods of Section 10.2. In Sections 10.3 to 10.5 we list 
some commonly used normal theory-based tests. In Section 10.3 we also deal with 
goodness-of-fit tests. In Section 10.6 we look at the hypothesis testing problem from 
a decision-theoretic viewpoint and describe Bayes and minimax tests. 


10.2 GENERALIZED LIKELIHOOD RATIO TESTS 


In Chapter 9 we saw that UMP tests do not exist for some problems of hypothesis 
testing. In was suggested that we restrict attention to smaller classes of tests and seek 
UMP tests in these subclasses or, alternatively, seek tests that are optimal against 
local alternatives. Unfortunately, some of the reductions suggested in Chapter 9, such 
as invariance, do not apply to all families of distributions. 

In this section we consider a classical procedure for constructing tests that has 
some intuitive appeal and that frequently, though not necessarily, leads to optimal 
tests. Also, the procedure leads to tests that have some desirable large-sample prop- 
erties. 

Recall that for testing Но: X ~ fo against Н: X ~ fi, ће Neyman-Pearson MP 
test is based on the ratio f1 (х) / fo(X). If we interpret the numerator as the best possi- 
ble explanation of x under H1, and the denominator as the best possible explanation 


490 


GENERALIZED LIKELIHOOD RATIO TESTS 491 
of X under Hp, it is reasonable to consider the ratio 


r(x) = SiPece, L(0;x)  supgce, Jo) 
Supgeo, L(0; X) ^ supece, fe(x) 


as a test statistic for testing Ho: Ө € Op against Hı: 0 € Ө}. Here L(0; x) is the 
likelihood function of X. Note that for each x for which the MLEs of Ө under €, and 
Өр, exist, the ratio is well defined and free of Ө and can be used as a test statistic. 
Clearly, we should reject Но if r(x) > c. 

The statistic r is hard to compute; only one of the two suprema in the ratio may be 
attained. Let 0 € © C Ry be a vector of parameters, and let X be a random vector 
with PDF (PMF) fe. Consider the problem of testing the null hypothesis Ho: X ~ 
Је, Ө € Oo against the alternative Hj: X ~ fe, 9 € ©. 


Definition 1. For testing Ho against Hi, a test of the form: reject Ho if and only 
if A(x) < c, where c is a constant, and 


A(x) = 5220660 Fo, 32. ->> cx) 
5арөєө foGu. x2, ... , Xn) 


is called a generalized likelihood ratio (GLR) test. 


We leave the reader to show that the statistics A(X) and r(X) lead to the same 
criterion for rejecting Ho. 

The numerator of the likelihood ratio A is the best explanation of X (in the sense of 
maximum likelihood) that the null hypothesis Ho can provide, and the denominator is 
the best possible explanation of X. Ho is rejected if there is a much better explanation 
of X than the best one provided by Ap. 

It is clear that O < A < 1. The constant c is determined from the size restriction 


sup Pg(A(X) < c] =a. 
ӨєӨо 


If the distribution of A is continuous (that is, the DF is absolutely continuous), any 
size о is attainable. If, however, A(X) is a discrete RV, it may not be possible to find 
a likelihood ratio test whose size exactly equals a. This problem arises because of 
the nonrandomized nature of the likelihood ratio test and can be handled by random- 
ization. The following result holds. 


Theorem 1. If for given a, 0 < а < 1, nonrandomized Neyman-Pearson and 
likelihood ratio tests of a simple hypothesis against a simple alternative exist, they 
are equivalent. 


The proof is left as an exercise. 
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Theorem 2. For testing Ө є Өр against Ө є O1, the likelihood ratio test is a 
function of every sufficient statistic for Ө. 


Theorem 2 follows from the factorization theorem for sufficient statistics. 


Example 1. Let X ~ b(n, p), and we seek a level o likelihood ratio test of 
Ho: p < po against Hi: p > po: 


sup (ra - py 


PSPo 


sup (ra - py 


0<р<1 


Mx) = 


Now 


sup p'(1— p)? = Gy (1 = Ly" 


0<р<1 п 


The function p* (1 — p)" * first increases, then achieves its maximum at р = x/n, 
and finally decreases, so that 


н : x 
"d — py ро(1 — ро)" * if po < =, 
sup р =P = X\x X\n-x x 
PSPo (=) (1 = ~=) if x < po. 
It follows that 
1— n—x 
po — po) TE х, 


A(x) = { (®/п)*{1— /n-* 


Note that A(x) x 1 for npo < x and A(x) = lif x < npo, and it follows that A(x) 
is a decreasing function of x. Thus A(x) < c if and only if x > c’, and the GLR test 
rejects Ho if x > c’. 

The GLR test is of the type obtained in Section 9.4 for families with an MLR 
except for the boundary A(x) = c. In other words, if the size of the test happens to 
be exactly о, the likelihood ratio test is a UMP level o test. Since X is a discrete RV, 
however, to obtain size œ may not be possible. We have 


a = sup Pp{X > с} = РХ > с}. 
PSPo 


If such a c’ does not exist, we choose an integer c’ such that 


PiX > с} <а and PpiX > c-l} >a. 
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The situation in Example 1 is not unique. For a one-parameter exponential family 
it can be shown (Birkes [6]) that a GLR test of Ho: Ө < against H1: 0 > 6 is 
UMP of its size. The result holds also for the dual Но : 0 > 69 and, in fact, for a 
much wider class of one-parameter family of distributions. 

The GLR test is specially useful when @ is a multiparameter and we wish to 
test hypothesis concerning one of the parameters. The remaining parameters act as 
nuisance parameters. 


Example 2. Consider the problem of testing и = po against и ~ ио in sam- 
pling from N (u, 07), where both и and o? are unknown. In this case Өр = 
{(uo, 07): o? > 0] and Ө = {(и,02): — oo < и < oo, o? > 0). We write 
0 = (u,0?): 


EE d Уло — д0)? шо)? 
Ыр fox) = oe | ЖҮ vel- 20? zem 


= f$, 9, 
where 62 is the MLE, ó2 = (1/n) У? у(х; — wo)”. Thus 


1 =n/2 


Бер fo SS Se ee 
Ол ny? [E i — no] ^ 


The MLE of Ө = (p, o?) when both и and е? are unknown is Qi xi/n, УМО; - 
x)? /n). If follows that 


1 Liei- и)? и)? 
co tsm = stp |e o|- 20? ue 


e n/2 
n/2 


© (Ол/пу"? [ts — 32] 


Мо [олар 
MG: — uo? 


1 
6 | 1+ [n(x — р0)2/ Ур Gu — %)?] 
The GLR test rejects Ho if 


A(x) « c, 


494 SOME FURTHER RESULTS OF HYPOTHESIS TESTING 


and since A(x) is a decreasing function of n(X — uo)? / Ў n(x; — X), we reject Ho 
if 


X — uo 
| >с 
Уб – х)2 


that is, if 





aae] yo 


where 52 = (n — 1)! УТО — X)*. The statistic 


Jn(X — uo) 


t(X) = $ 


has a £-distribution with n — 1 d.f. Under Ho: ш = ио, tf (X) has a central t(n — 1) 
distribution, but under Hj: u Æ no, t (X) has a noncentral t-distribution with n — 1 
d.f. and noncentrality parameter ô = (и — ио)/ет. We choose c" = 1, 44/2 in 
accordance with the distribution of t (X) under Ho. Note that the two-sided t-test 
obtained here is UMP unbiased. Similarly, one can obtain one-sided t-tests also as 
likelihood ratio tests. 


The computations in Example 2 could be slightly simplified by using Theorem 2. 
Indeed, T(X) = (X, S2) is a minimal sufficient statistic for 0, and since X and 52 
are independent, the likelihood is the product of the PDFs of X and 52. We note that 


X ~ Ми, o? /n) and S? ~ [o?/(n — 1)]x2_,. We leave it to the reader to carry out 
the details. 


Example 3. Let X1, X2,... , Xm and Y), Y2,... , Y, be independent random 
samples from A (u1, 02) and N (u2, 02), respectively. We wish to test the null 
hypothesis Му: c2 == оў against Hj : оў # o2. Here 

Ө = (ша, оў, u2, 02): — 00 < шщ < оо, 02 > 0,i = 1,2} 
апа 


= (Qu, 07, u2, 09): ~ < ш < 00,Ї = 1,2,02 = оў > 0). 


Let 0 = (m1, o2, H2, 02). Then the joint PDF is 


бра" ote. 2 i< E 2 
Oroig e|- 2 Le n) — 202 à 6i иэ) I 


fox, y) = 
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Also, 


m+n 
2 


m n ) (xi - ш)? 
1 lx — — 1 Z aS lo 2 EX EA. 
Og Z7 оро, g 05 2 





log fo(x, y) = — 


Differentiating with respect to и and u2, we obtain the MLEs 
Ay =X, йо = у. 
Differentiating with respect to с апі c2, we obtain the MLEs 
2 lx 2 21 2 
о = — Xi; — У бух = — i — К 
9j n 2 i — X) cu à o у) 
If, however, o2 = о} = о?, the MLE of o? is 
62 — Уг Оо -xY + NiO: – У)? 
m+n ` 
Thus 
ge (mt 
Sup fg(X, y) = ————9—————9————————————————————— 
өсө, 2л /(m + n)e*»2 [ух — x)? + Уо -p 
апа 
eg (nn 


sup fe(x, y) = ———— — ———————————ÀMÓàÀ———————à3: 
BEO (2л /т)"/2(2л /п)"/2 [ET Оң — xpp^ [316i - yyy? 





so that 
Mx, у) = ( P. (С) [ete -DT otov -°] 
ae m+n) [ут - 2+ Digi – yp]? 
Now 


Era - ^ [So - wT? 
[X7 6; 3 + X10; - 2]? 
1 
[Lc ET -DEO -DTP УО -2/ X7 -o 
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Writing 


OX Gs - X? /(n 1) 


уу" 


ме һауе 


т m/2 Е n/2 
Ах, у) = 
(x. у) G5) (5) 
1 


АУ a eae Т 
{1+ [0m — 1)/@ — ТРО + (a — D/G — DYG/f£))? 








We leave the reader to check that A(x, у) < c is equivalent to f < сү or f > c2. 
(Take logarithms, 3adá use properties of convex functions. Alternatively, differentiate 
log À.) 

Under Но, the statistic 


Èri - XY /(m - 1) 
10% — Y?/( — 1) 


has ап F(m — 1, п — 1) distribution, so that c1, cz can be selected. It is usual to take 
a 
P{F < с) = P{F > сд} = PE 


Under Hi, (02/02) Е has an F(m — 1, n — 1) distribution. 


In Example 3 we can obtain the same GLR test by focusing attention on the joint 
sufficient statistic (X, Y, 52, S2), where 52 and SẸ are sample variances of the X's 
and the Y's, respectively. In order to write down the likelihood function, we note 
that X, Y, 52. 52 аге independent RVs. Тһе distributions X and 5% are the same as 
in Example 2 except that m is the sample size. Distributions of Y and 55 require 
appropriate modifications. We leave the reader to carry out the details. It turns out 
that the GLR test coincides with the UMP unbiased test in this case. 

In certain situations the GLR test does not perform well. We reproduce here an 
example due to Stein and Rubin. 


Example 4. Let X be a discrete RV with PMF 


= if x = +2, 
2 
Eu if x = +1, 





2 
a if x = 0, 
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under the null hypothesis Ho: p = 0, and 





pc if x = —2, 
Ic cu ifx = 41, 
PAX j 1-а {2 
pi^ = Xj = t= 
«( =) if x = 0, 
1-а 
(1 — p)c if x = 2, 


under the alternative Hj: р є (0, 1), where œ and c are constants with 


] 
Осо. апа = 


To test the simple null hypothesis against the composite alternative at the level of 
significance о, let us compute the likelihood ratio A. We have 


since 0/2 < c. Similarly, A(—2) = a/(2c). Also, 


1 
za 1-а 1 
Ма) 2 A(—1) = а, ы 
US COSMAE Wee “С 
and 
1-а 
que 


The GLR test rejects Ho if A(x) « К, where К is to be determined so that the level 
is a. We see that 


Py |+ < = = ВХ = 42) =a, 


provided that a/2c < [(1 — a@)/(1 — c)]. But a/(2 — a) < c < а implies that 
а < 2c — са, so that œ — cæ < 2с —2ca, or a(l — с) < 2c(1 — о), as required. Thus 
the GLR size o test is to reject Ho if X = +2. The power of the GLR test is 


Pp poo < у=} = Рх = 42) = рена pee <a 


for all p є (0, 1). The test is not unbiased and is even worse than the trivial test 
ф(х) =a. 
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Another test that is better than the trivial test is to reject Но whenever x = 0 (this 
is opposite to what the likelihood ratio test says). Then 





c A 
>a (since c < а), 


PiX =0}=a, and РХ =0} = 


for all p € (0, 1), and the test is unbiased. 


We will use the generalized likelihood ratio procedure quite frequently hereafter 
because of its simplicity and wide applicability. The exact distribution of the test 
statistic under Но is generally difficult to obtain (despite what we saw in Examples 1 
to 3 above), and evaluation of power function is also not possible in many problems. 
Recall, however, that under certain conditions the asymptotic distribution of the MLE 
is normal. This result can be used to prove the following large-sample property of 
the GLR under Ho, which solves the problem of computation of the cutoff point c at 
least when the sample size is large. 


Theorem 3. Under some regularity conditions on f@(x), the random variable 
—2 log A(X) under Но is asymptotically distributed as a chi-square RV with degrees 
of freedom equal to the difference between the number of independent parameters in 
© and the number in Ө. 


We will not prove this result here; the reader is referred to Wilks [117, p. 419]. 
The regularity conditions are essentially those associated with Theorem 8.7.4. In 
Example 2 the number of parameters unspecified under Ho is 1 (namely, o2), and 
under Н; two parameters are unspecified (и and a”), so that the asymptotic chi- 
square distribution will have 1 d.f. Similarly, in Example 3, the d.f. — 4 — 3 — 1. 


Example 5. In Example 2 we showed that in sampling from a normal population 
with unknown mean u and unknown variance o?, the likelihood ratio for testing 
Ho: р = шу against Hi: р Æ ро is 


A(x) =|1+ _пк—нд)?_ ds 
Е 0i — Хх)? 
Thus 
> (X — po)? 
—2 log A(X) = n iog [ tey, = | К 


Under Ho, /п(Х — uo)/o ~ N(O, 1) and Y 1 (X; — Х)2/02 ~ x?(n — 1). Also, 


Y? GG -X)2/[(n — 1)0?] 1. It follows that if Z ~ A/(0, 1), then ~2 log A(X) 
has the same limiting distribution as.z log[1 + 7? /(п — 1)]. Moreover, 


2 п 
( + s) -L, exp(Z?) 
n — 
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and since logarithm is a continuous function, we see that 


7? 
n log ( T E EN 
n — 


Thus —21og A(X) E Y, where Y ~ x?(1). This result is consistent with Theo- 
rem 3. 


PROBLEMS 10.2 


1. 
2. 


6. 


Prove Theorems 1 and 2. 


A random sample of size n is taken from the PMF P(X; = xj) = pj, j= 
1,2,3,4,0 < pj < 1, 3 pj = 1. Find the form of the GLR test of 


Ho: pi = p2 = рз = pa = 1 against Ну: р = p2 = р/2, рз = pa 
(1— р)/2,0 <р « 1. 


. Find the GLR test of Ho: p = po against Hı: p # po, based on a sample of 


size 1 from b(n, p). 


Let X1, X2, ... , Xn be a sample from Л (и, 02), where both и. and е? are un- 
known. Find the GLR test of Ho: o = o against Hy: o # оо. 


Let X1, X2,... , Xy bea sample from the PMF 
1 
РМХ = = т. j=1,2,...,N,N > 1 is an integer. 


(a) Find ће СІ К test of Ho: N < No against Ну: N > No. 
(b) Find the GLR test of Но: N = No against H1: N Æ No. 


For a sample of size 1 from the PDF 
2 
So(x) = gi – х), О <х <9, 


find the GLR test of Ө = 00 against Ө Æ 6. 


. Let X1, X2,... , Xn be a sample from G(1, В). 


(a) Find the GLR test of B = Во against В Æ Во. 
(b) Find the GLR test of В < Во against B > Bo. 


. Let (X1, Y1), (X2, Y2), ... , (Xn, Yn) be a random sample from a bivariate nor- 


mal population with EX; = ил, EY; = m, var(X;) = o°, var(¥;) = o?, 
and cov(X;, Y;) — po’. Show that the likelihood ratio test of the null hypoth- 
esis Ho: p = 0 against Hı: p 4 0 reduces to rejecting Ho if |R| > c, where 
R= 2511/ (52-525, $11, $?, апа 52 being the sample covariance and the sample 
variances, respectively. (For the PDF of the test statistic R, see Problem 7.7.1.) 
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9, Let X1, X2, ... , Xm be iid G(1, 0) RVs and let Yi, Y2, ... , Yn be iid С(1, и) 
RVs, where 0 and и are unknown positive real numbers. Assume that the X's 
and the Y's are independent. Develop an o-level GLR test for testing Ho: Ө = и 
against H1: 0 # и. 


10. A die is tossed 60 times in order to test Ho: P{j} = 1/6, j = 1,2,... , 6(dieis 
fair) against H1: P{2} = P(4) = P{6} = $ P(1) = P{3) = Р{5} = І. Find 
the GLR test. 


11. Let X1, X2, ... , X, be iid with the common PDF fo(x) = ехр[—(х — 4)], 
x > Ө and = 0 otherwise. Find the level a GLR test for testing Ho: Ө < 00 
against Н; : 0 > 00. 


12. Let X4, X2, ... , X, be iid RVs with the common Pareto PDF јо(х) = 0/x? for 
x > 0, and = 0 elsewhere. Show that the family of joint PDFs has MLR in X(1) 
and find a size о test of Но: 0 = 09 against H1: 0 > 09. Show that the GLR test 
coincides with the UMP test. 


10.3 CHI-SQUARE TESTS 


In this section we consider a variety of tests where the test statistic has an exact 
or a limiting chi-square distribution. Chi-square tests are also used for testing some 
nonparametric hypotheses and are taken up again in Chapter 13. 

We begin with tests concerning variances in sampling from a normal population. 
Let X, X2, ... , Xn be iid A (u, т?) RVs where ø? is unknown. We wish to test a 
hypothesis of the type o? > o2, o? < od, or o? = оф, where og is some given 
positive number. We summarize the tests in the following table: 





Reject Но at Level o if: 

















Ho Hi и Known u Unknown 
n 2 2 2 2 oj 
1. 0 209 0 «0p Er — AY < x2 10% Дуа p Xa-1,1-a 
2 og 
n 
П. о<о 0 >00 Улог — Ш)? > хаад трагу 7 
п 2 2 2 2 % 2 
DiGi — И)? < Xn1-02% ss "REL, 
Ш. 0 —09 O0 #09 or or 
2 2 2 2 % 2 
п 
Li Gi = А)? = Xaaa SO 2 taal 





Remark 1. All these tests can be derived by the standard likelihood ratio proce- 
dure. If р is unknown, tests I and П are UMP unbiased (and UMP invariant). If и 
is known, tests I and II are UMP (see Example 9.4.5). For tests III we have chosen 
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constants c1, c2 so that each tail has probability с /2. This is the customary proce- 
dure, even though it destroys the unbiasedness property of the tests, at least for small 
samples. 


Example 1. A manufacturer claims that the lifetime of a certain brand of batteries 
produced by his factory has a variance of 5000 (hours)?. A sample of size 26 has a 
variance of 7200 (hours)*. Assuming that it is reasonable to treat these data as a 
random sample from a normal population, let us test the manufacturer’s claim at the 
a = 0.02 level. Here Ho: o? = 5000 is to be tested against H,: o? # 5000. We 
reject Ho if either 


2 оф 2 2 в 2 
s^ = 7200 < n 1А"-11-а/2 or s > n — 15Хт-1.е/2` 
We have 
2 
% 2 5000 
and 
2 
% 2 5000 
mem: J Xn-1,a/2 = 7s x 44.314 — 8862.8 
Since s? is neither < 2304.8 nor > 8862.8, we cannot reject the manufacturer's 


claim at the 0.02 level. 


A test based on a chi-square statistic is also used for testing the equality of several 
proportions. Let X1, X2,... , Xy be independent RVs with X; ~ b(nj;, pi), i = 
1,2, , k, K 2. 


Theorem 1. The RV Y * |[(X; — nip)/A/nipi(1 — pil? converges in distribu- 
tion to the xk) RV asnj,n2,... , ny — oo. 


The proof is left as an exercise. 


If 11, n2, ... , ny are large, we can use Theorem 1 to test Но: ру = рз = = 
рк = p against all alternatives. If p is known, we compute 


ys Y | Xi — nip | 
1 v nip(l ж р) ' 
and if y > Xo we reject Ho. In practice, p will be unknown. Let p = (pi, p2, 
. » рк). Then the likelihood function is 


k H 
LP; х1,...,хр) = I] Lora = pon] 
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so that 


k k k 
Ж 
log L(p; x) = }_ log (") + Ух log pi + Y (ni — xi) log(1 — р). 
i=l і i=} i=l 
The MLE p of p under Но is therefore given by 


k k 
үч Liu — xi) 


= 0, 
p 1-p 


that is, 


xi +х ++ 


Pn tng ory 


Under certain regularity. assumptions (see Cramér [16, pp. 426-4271) it can be shown 
that the statistic 


k 242 
(Xi — nip) 

1 y = otis RM xn 
ч) : nip( — p) 


is asymptotically x?(k — 1). Thus the test rejects Но: рі = p2 =--> = py = р, р 
unknown, at level o if y1 > X; Ls 

It should be remembered that the tests based on Theorem 1 are all large-sample 
tests and hence not exact, in contrast to the tests concerning the variance discussed 
above, which are all exact tests. In the case k = 1, UMP tests of p > po and p < po 
exist and can be obtained by the MLR method described in Section 9.4. For testing 
P == po, the usual test is UMP unbiased. 

In the case k = 2, if nı and nz are large, a test based on the normal distribution 
can be used instead of Theorem 1. In this case the statistic 


Xi/ni — X2/n2 


VBC Р/т + 1/n2) 


where p = (Ху + X2)/(ni - n2) is asymptotically AN (0, 1) under Ho: р = p2 = p. 
If p is known, one uses p instead of p. It is not too difficult to show that zi is еш 
to Y1, so that the two tests are equivalent. 

For small sampies the Fisher-Irwin test is commonly used and is based on the 
conditional distribution of X, given T = Хү + X2. Let p = [pi — р»)]/[р2(1— 
р\)]. Then 


2 n ; cag n =1 ИР 
Pa + = = Y ("1 }р{а- т)" ue "Yo "(be pio 


j=0 


t 
=}, CH 2 рав, n2) 
j=0 J E= 


(2) Z= 
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where 


t 
a(ni, n2) = (1 — pi" (1 — ро)" (5) 
-p 


It follows that 
п\ x m-x( 12 t—x па -1+х 
1 — 1 ES 2 
("")or pi) (2) (1 — p2) 
t fn m \ , 
amnad (")( )e 
j=0 J ©—] 
ny n2 x 
$ . jP 
а us) 
> 7 
пі n І 
Ле 
d j 


On the boundary of any of the hypotheses pı = p2, pi < p2 or pi > p2, we note 
that p = 1, so that 


CC.) 
x t—x 
P|X| =x|Xy + X2 251] = ~, 
[X1 = xIXi 2-1] PORTU 

t 
which is а hypergeometric distribution. For testing Ho: pı < p2 this conditional test 
rejects if Хү < k(t) where k(t) is the largest integer for which P[x 1 € K(T)|T = 
t} < о. Obvious modifications yield critical regions for testing p; = ро, and р > 
p2 against corresponding alternatives. 


In applications a wide variety of problems can be reduced to the multinomial 
distribution model. We therefore consider the problem of testing the parameters of a 


P{X, =2x|X,+X,=t)= 


multinomial distribution. Let (X1, X2,... , Хк) bea sample from a multinomial 
distribution with parameters n, pi, ро, ... , px-1, and let us write Xy =n — Ху — 
eoo Хер, and py = 1 — py —--- — pia. The difference between the model of 


Theorem 1 and the multinomial model is the independence of the X;'s. 


Theorem 2. Let (X1, X2,... , Xy 1) be a multinomial RV with parameters n, 
Pi, P2, ·-. , рк—1. Then the RV 
k 2 
(X; — npi) 
(3) ie Se 
yes 


i=} 
is asymptotically distributed as a x? (k — 1) RV (as n — oo). 


Proof. For the general proof we refer the reader to Cramér [16, pp. 417—419] or 
Ferguson [26, p. 61]. We will consider here the k = 2 case to make the result a little 
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more plausible. We have 


_ Qa -npY | O07 прэ)? _ (OG np)? | In - Xi — (0 - p0P 


U2 
пр\ np? npi n(l— pi) 
] 1 
= (Ху -api* [+ + aas] 
np л(1— pi) 
| (Xi = np? 
пр\(1— pi). 


It follows from Theorem 1 that U; 4 Y as n — oo, where Y — х 1). 
To use Theorem 2 to test Ho: pı = р\,... , Pk = Py, we need only to compute 
the quantity 


k 2 
TR Y (xi — пр;) 


i пр; 
from the sample; if п is large, we reject Ho if u > x es 


Example 2. A die is rolled 120 times with the following results: 


Result 1 2 3 4 5 6 
Frequency: | 20 30 20 25 15 10 


Let us test the hypothesis that the die is fair at level a = 0.05. The null hypothesis 
is Ho: pi = І, i = 1,2,...,6, where pj is the probability that the face value is i, 
1 <i x 6. By Theorem 2, we reject Ho if 


6 1412 
[x; — 120(2)1 2 
к ) C RES > X5,0.05: 
: 120(4) 
We have 


10? 5 5 10? 
= == — + — + — = 12.5. 
BE ag Tag "20 
Since x5,0.05 = 11.07, we reject Ho. Note that if we choose a = 0.025, then 
X5,0.025 = 12.8, and we cannot reject at this level. 


Theorem 2 has much wider applicability, and we will later study its application 
to contingency tables. Here we consider the application of Theorem 2 to testing the 
null hypothesis that the DF of an RV X has a specified form. 


Theorem 3. Let X1, X2,... , X, be a random sample on X. Also, let Но: X ~ 
F, where the functional form of the DF F is known completely. Consider a collec- 
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tion of disjoint Borel sets Ат, A2,... , Ax that form a partition of the real line. Let 
Р(Х € Aj} = pi, i = 1,2,... , К, and assume that p; > 0 for each i. Let Y; = 
number of X;’s in Ау, j = 1,2, ... , k, i = 1,2,... , n. Then the joint distribution 
of (Y1, Y2, ... , Yy 1) is multinomial with parameters n, рі, рә, . .. , Pk—1- Clearly, 
Y, =n — Yj —---— Yki and pk = 1 — pi —---— pk-a- 


The proof of Theorem 3 is obvious. One frequently selects A1, A2, ... , Ax as 
disjoint intervals. Theorem 3 is especially useful when one or more of the parameters 
associated with the DF F are unknown. In that case the following result is useful. 


Theorem 4. Let Ho: X ~ Fe, where Ө = (01, 62,... ,0,) is unknown. Let 
Xi, X2, ... , Xn be independent observations on X, and suppose that the MLEs of 
Ө, 02, ... , 0, exist and are, respectively, ĝi, 5, ats , 6. Let А, A5, ..., Ac be a 
collection of disjoint Borel sets that cover the real line, and let 


Pi = РХ € Ai} > 0 1—12,.: Kk, 


where 0 — (61, eng &,), and Pg is the probability distribution associated with Fg. 
Let ү, Y2, ... , Y, be the RVs, defined as follows: Y; = number of X1, X2, ... , X, 
in Aj, i = 1,2,...,k. 

Then the RV 


T Y Y; — nj 
k= ae 
п=1 Ц 


is asymptotically distributed as a x?(k — ғ — 1) RV (as п — oo). 


The proof of Theorem 4 and some regularity conditions required on Fe are given 
in Rao [86, pp. 391—392]. 
To test Ho: X ~ F, where F is completely specified, we reject Но if 


k 2 
(уг —npi) 2 
и = ———— > у у, 
2, пр k—1,a 


provided that n is sufficiently large. If the null hypothesis is Ho: X ^ Fe, where Fg 
is known except for the parameter Ө, we use Theorem 4 and reject Но if 


k 


Ў? (i — пр)? 2 
v = ————————— , 
i=l пр; Суета 


where r is ће number of parameters estimated. 


Example 3. The following data were obtained from a table of random numbers 
of normal distribution with mean 0 and variance 1. 
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0.464 0.137 2.455 —0.323 —0.068 
0.906 —0.513 —0.525 0.595 0.881 
—0.482 1.678 —0.057 —1.229 0.486 
—1.787 0.261 1.237 1.046 —0.508 


We want to test the null hypothesis that the DF F from which the data came is 
normal with mean 0 and variance 1. Here F is completely specified. Let us choose 
three intervals (—оо, —0.5], (—0.5, 0.5], and (0.5, oo). We see that Y; = 5, Y? = 8, 
апа Уз = 7. 

Also, if Z is А/(0, 1), then p; = 0.3085, p2 = 0.3830, and рз = 0.3085. Thus 


3 2 
(yi — пр) 
и = ттт 
2 пр: 
_ (5—20 x 0.3085)? 2 (8 — 20 x 0.383)? E (7 — 20 x 0.3085)? 
= 6.17 7.66 6.17 


< 1. 
Also, Xi 0.05 = 5.99, so we cannot reject Но at level 0.05. 


Example 4. In a 72-hour period on a long holiday weekend, there was a total of 
306 fatal automobile accidents. The data are as follows: 


Number of Fatal Accidents 





per Hour Number of Hours 
Oorl 4 
2 10 
3 15 
4 12 
5 12 
6 6 
7 6 
8 or more 7 


Let us test the hypothesis that the number of accidents per hour is a Poisson RV. 
Since the mean of the Poisson RV is not given, we estimate it by 
306 


= = — = 4.25. 
ee 


Let us now estimate р; = Р(Х =i}, і = 0, 1,2,..., ро = e^ — 0.0143. Note 
that 


^ 


PX-—xtl À 
PX-x)po xc 





, 
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so that Буз = [A/G + 1)1р:. Thus 


рі = 0.0606, p; = 0.1288, рз = 0.1825, p4 = 0.1939, 
ps = 0.1648, pg = 0.1167, ру = 0.0709, pg = 1 — 0.9325 = 0.0675. 


The observed and expected frequencies are as follows: 


Oor1 2 3 4 5 6 7 8 or More 


Observed frequency, 0; 4 10 15 12 12 6 6 7 
Expected frequency 5.38 928 13.14 13.96 11.87 841 5.10 4.86 
= 72р; = ei 








Po >) (0; zar 


i= 


2.74. 


Since we estimated one parameter, the number of degrees of freedom is k ~r — 1 = 
8 — 1 — 1 = 6. From Table ST3, X2005 = 12.6, and since 2.74 « 12.6, we cannot 
reject the null hypothesis. 


Remark 2. Any application of Theorem 3 or 4 requires that we choose sets 
Aj, A2,... , Ар, and frequently these are chosen to be disjoint intervals. As a rule 
of thumb, we choose the length of each interval in such a way that the probabil- 
ity P(X € Aj] under Ho is approximately 1/k. Moreover, it is desirable to have 
п/к > 5 or, rather, е; > 5 for each i. If any of the e;’s is < 5, the corresponding 
interval is pooled with one or more adjoining intervals to make the cell frequency at 
least 5. If any pooling is done, the number of degrees of freedom is the number of 
classes after pooling, minus 1, minus the number of parameters estimated. 


Finally, we consider a test of homogeneity of several multinomial distributions. 
Suppose that we have c samples of sizes n1, n2, ... , п from c multinomial distribu- 
tions. Let the associated probabilities with the jth population be (pij, p2j, ... » prj) 
where 2; 4 pij = 1, j = 1,2,...,c. Given observations №;;, i = 1,2,...,r, 
j —1,2,...,cwith $7 Nij пу, j =1,2,... ,c we wish to test Ho: pij = pi, 
for j = 1,2,...,с,і = 1,2,...,r — 1. The case c = 1 is covered by Theorem 2. 
By Theorem 2 for each j, 


U. = Y (Nij = пур)? 
DE njpi 
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has a limiting Xx distribution. Since samples are independent, the statistic 


TAS уу е (Nij —пур)? 


j=li= njpi 
has a limiting Xn distribution. If p;'s are unknown, we use the MLEs 


c 
m N; 

B-lkmMS чырр. кей 
jj 


for pi, and we see that the statistic 


Wc y ME 


j=l i=l njPi 


has a chi-square distribution with c(r — 1) — (r — 1) = (c — D)(r — 1) d.f. We reject 
Но at (approximate) level a is V, > Kee 


Example 5. A market analyst believes that there is no difference in preferences of 
television viewers among the four Ohio cities of Toledo, Columbus, Cleveland, and 
Cincinnati. To test this belief, independent random samples of 150, 200, 250, and 
200 persons were selected from the four cities and asked, “What type of program 
do you prefer most: mystery, soap, comedy, or news documentary?” The following 
responses were recorded: 


City 
Program Type Toledo Columbus Cleveland Cincinnati 
Mystery 50 70 85 60 
Soap 45 50 58 40 
Comedy 35 50 72 67 
News 20 30 35 33 
Sample size 150 200 250 200 


Under the null hypothesis that the proportions of viewers who prefer the four 
types of programs are the same in each city, the maximum likelihood estimates of 
Di, i = 1,2, 3, 4 are given by 


; _ _50+70+85+60_ _265 (.. 
Р = 7504+ 200+ 2504200 800 "^ 


. _45+50458+40 _ 193 
UT E BY 
pee 800 = 800 
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ae c e Da, 
800 800 

; 20-30-3533 118 (i; 

He 800 "80 ^ 


Here p, —proportion of people who prefer mystery, and so on. The following table 
gives the expected frequencies under Но: 





Expected Number of Responses Under Ho 





Program 

Type Toledo ` Columbus Cleveland Cincinnati 
Mystery 150 х 0.33 = 49.5  200x0.33—66 250 х 0.33 = 825 200 x 0.33 = 66 
Ѕоар 150 х 0.24 = 36 200 х 0.24 = 48 250 х 0.24 = 60 200 х 0.24 = 48 
Comedy 150 x 0.28 = 42 200 х 0.28 = 56 250 x 0.28 = 70 200 х 0.28 = 56 
News. 150 x 0.15 = 225  200x0.15 230 250 х 0.15 = 37.5 200 х 0.15 = 30 
Sample size 150 200 250 200 


It follows that 


_ (50 = 49.5)2 45-39? (35—42? _ (20 — 22.5) 


иа = ASG 36 ^' 42 22.5 
70—66) (50—48) (50 — 56)2 (30 — 30) 
+ ( ) y ( ) $ ( ) А ( ) 
66 48 56 30 
(85 — 82.5)? (58—60)? " (772—707 (35 — 37.5)? 
82.5 60 70 37.5 
(60—66)? (40—48)? (67 — 56)2 (33 — 30)? 
кле о с 30 
= 9.37. 


Since c = 4 and r = 4, the number of degrees of freedom is (4 — 1)(4 — 1) = 9 and 
we note that under Ho 


0.30 « P(U44 > 9.37} < 0.50. 
With such a large P-value we can hardly reject Ho. The data do not offer any evi- 
dence to conclude that the proportions in the four cities are different. 
PROBLEMS 10.3 


1. The standard deviation of capacity for batteries of a standard type is known to 
be 1.66 ampere-hours. The following capacities (ampere-hours) were recorded 
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for 10 batteries of a new type: 146, 141, 135, 142, 140, 143, 138, 137, 142, 136. 
Does the new battery differ from the standard type with respect to variability of 
capacity? (Natrella [73, p. 4-1]) 


2. A manufacturer recorded the cutoff bias (volts) of a sample of 10 tubes as fol- 
lows: 12.1, 12.3, 11.8, 12.0, 12.4, 12.0, 12.1, 11.9, 12.2, 12.2. The variability of 
cutoff bias for tubes of a standard type as measured by the standard deviation is 
0.208 volt. Is the variability of the new tube with respect to cutoff bias less than 
that of the standard type? (Natrella [73, p. 4-5]) 


3. Approximately equal numbers of four different types of meters are in service and 
all types are believed to be equally likely to break down. The actual numbers of 
breakdowns reported are as follows: 


Type of Meter 1 2 3 4 
Number of Breakdowns Reported 30 40 33 47 


Is there evidence to conclude that the chances of failure of the four types are not 
equal? (Natrella [73, p. 9-4]) 


4. Every clinical thermometer is classified into one of four categories, A, B, C, D, 
on the basis of inspection and test. From past experience it is known that ther- 
mometers produced by a certain manufacturer are distributed among the four 
categories in the following proportions: 


Category A B C D 
Proportion 0.87 0.09 0.03 0.01 


A new lot of 1336 thermometers is submitted by the manufacturer for inspection 
and test and the following distribution into the four categories results: 





Category A B C D 
Number of Thermometers Reported 1188 91 47 10 


Does this new lot of thermometers differ from the previous experience with re- 
gard to proportion of thermometers in each category? (Natrella [73, p. 9-2]) 


m 


А computer program is written to generate random numbers, X, uniformly in 
the interval 0 < X < 10. From 250 consecutive values the following data are 
obtained: 


X-Value 0-1.99 2-3.99 4-5.99 6-7.99 8-9.99 
Frequency 38 55 54 Al 62 


Do these data offer any evidence that the program is not written properly? 


6. A machine working correctly cuts pieces of wire to a mean length of 10.5 cm 
with a standard deviation of 0.15 cm. Sixteen samples of wire were drawn at 
random from a production batch and measured with the following results (cen- 
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timeters): 10.4, 10.6, 10.1, 10.3, 10.2, 10.9, 10.5, 10.8, 10.6, 10.5, 10.7, 10.2, 
10.7, 10.3, 10.4, 10.5. Test the hypothesis that the machine is working correctly. 


. An experiment consists in tossing a coin until the first head shows up. One hun- 


dred repetitions of this experiment are performed. The frequency distribution of 
the number of trials required for the first head is as follows: 


Number of Trials 1 2 3 
Frequency 40 32 15 


Can we conclude that the coin is fair? 


. Fit a binomial distribution to the following data: 





x 0 1 2 
Frequency 8 46 55 


. Prove Theorem 1. 
10. 





4 бо more 


6 


Die 3 


38 
60 
64 
58 
73 
67 


Three dice are rolled independently 360 times each with the following results. 
Face Value Die 1 Die 2 
1 50 62 
2 48 55 
3 69 61 
4 45 54 
5 71 78 
6 77 50 
Sample size 360 360 


360 





Are all the dice equally loaded? That is, test the hypothesis Ho: рау = рр = 
різ, і = 1,2, ... , 6, where pj, is the probability of getting an i with die 1, and 


50 on. 


Independent random samples of 250 Democrats, 150 Republicans, and 100 Inde- 
pendent voters were selected one week before a nonpartisan election for mayor 
of a large city. Their preference for candidates Albert, Basu, and Chatfield were 


recorded as follows. 











Party Affiliation 
Preference Democrat Republican Independent 
Albert 160 70 90 
Basu 32 45 25 
Chatfield 30 23 15 
Undecided 28 12 20 
Sample size 250 150 150 
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Are the proportions of voters in favor of Albert, Basu, and Chatfield the same 
within each political affiliation? 


12. Of 25 income tax returns audited in a small town, 10 were from low- and middle- 
income families and 15 from high-income families. Two of the low-income fam- 
ilies and four fo the high-income families were found to have underpaid their 
taxes. Are the two proportions of families who underpaid taxes the same? 


13 


A candidate for a congressional seat checks her progress by taking a random 
sample of 20 voters each week. Last week, six reported to be in her favor. This 
week nine reported to be in her favor. Is there evidence to suggest that her cam- 
paign is working? 


14. Let {X11, X21, --- 3 Xr}, ... , (Xie; X20, ... , Xrc} be independent multino- 
mial RVs with parameters (21, P11, P21,--- s Pri), «s (пс, Ple P265 +++» Pro), 
respectively. Let X;. = 55 ., Xij and 35 4j = n. Show that ће GLR test 
for testing Ho: pij = pj, for j = 1,2,...,c,i = 1,2,...,r — 1, where ру’ 
are unknown against all alternatives can be based on the statistic 


ко (5) /T G2) 


Ы 


10.4 ¢-TESTS 
In this section we investigate one of the most frequently used types of tests in statis- 


tics, the tests based оп a t-statistic. Let X1, X2, ... , X, be a random sample from 
Nu, a2), and, as usual, let us write 


X=n"! Y Xi, 52 = (п – 1)7! У — Xy. 
1 1 


The tests for usual null hypotheses about the mean can be derived using the GLR 
method. In the following table we summarize the results. 





Reject Ho at Level о if: 
Н, H; о? Known a? Unknown 
— o a 5 
1. к< к> Ho X > pot va X > ро + жее 
= о _ 5 
П. и> Ho H < Ho X € pot Nri X < pot ЖЕ, К 


ЕХ с т s 
Ш. Ш = Шо Ш * uo |X — pol = ж" [x — Bol = ж” 1,а/2 
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Remark 1. А test based on a t-statistic is called a t-test. The t-tests in I and П 
are called one-tailed tests; the t-test in III, a two-tailed test. 


Remark 2. If a? is known, tests I and П are UMP and test Ш is UMP unbiased. 
If о? is unknown, the t-tests are UMP unbiased and UMP invariant. 


Remark 3. If п is large, we may use normal tables instead of t-tables. The as- 
sumption of normality may also be dropped because of the central limit theorem. For 
small samples care is required in applying the proper test, since the tail probabili- 
ties under normal distribution and t-distribution differ significantly for small n (see 
Remark 7.4.2). 


Example 1. Nine determinations of copper in a certain solution yielded a sample 
mean of 8.3 percent with a standard deviation of 0.025 percent. Let и be the mean 
of the population of such determinations. Let us test Ho: и = 8.42 against Hy: и < 
8.42 at level a = 0.05. 

Here n = 9, х = 8.3, s = 0.025, Ho = 8.42, and Їїп-—1,1—« = —Í18,0.05 = —1.860. 

Thus 


5 0.025 
—=t-1.1-a = 8.42 — ———1.86 = 8.4045. 
Bo + Jn п—1,1—@ 3 
We reject Не since 8.3 < 8.4045. 
We next consider the two-sample case. Let X1, X2,..., Xm and Yj, Y2, ... , Y, 


be independent random samples from N (шу, с?) and N (u2, оў), respectively. Let 
us write 


X-—-m!yYT Xi, Y =n! УИ, 
52 = (т — 1) У(Х; – XY, 52 = (п – УИ – У), 
апа 


g.0- 15? + (n — DS; 
P m+n—2 


52 is sometimes called the pooled sample variance. The following table summarizes 
the two sample tests comparing jz; and иэ: 
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Hs Hı Reject Ho at Level a if: 
(8 = known constant) o2, o2 Known 02, о? Unknown, оу = оз 
І py — pn <8 Ki- m> х-у> Х—ў > 6--1м+п-2,е 
о? о? 1 1 
ô + Za — + 2 Sp + — 
m n m n 
П mÈ ui-u«0 x-ysz X —y € 6 — tnin-22 
о о, 1 1 
ê — Za] — + —- Sp] — + — 
m n m n 
Ш ш-60=8 m-m£$ó gdx-y-ó2z [x —y —8| > fuin-22/2 


o оў 1 1] 
m n m n 


Remark 4. The case of most interest is that in which 5 = 0. If оў, оў аге un- 


known and o2 = 02 = o?, o? unknown, then 52 is an unbiased estimate of c?. 
1 2 p 


In this case all the two-sample t-tests are UMP unbiased and UMP invariant. Before 
applying the t-test, one should first make sure that o2 = 02 = 07, o? unknown. This 


means applying another test on the data. We consider this test in the next section. 





Remark 5. If m+n is large, we use normal tables; if both m and n are large, we 
can drop the assumption of normality, using the CLT. 


Remark 6. Тһе problem of equality of means in sampling from several popula- 
tions will be considered in Chapter 12. 


Remark 7. The two sample problem when o; Æ оз, both unknown, is com- 


monly referred to as Behrens—Fisher problem. The Welch approximate t-test of 
Ho: [41 = ио is based on a random number of d.f. f given by 


р eS E 
f= 1+R}) т-1 (1+ А)2п– 1 i 


=. S2/m 
S2 /n 


where 





and the t-statistic 


з (X – У) – (ш – из) 


V S2/m + S2/n 


T 
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with f d.f. This approximation has been found to be quite good even for small sam- 
ples. The formula for f generally leads to noninteger d.f. Linear interpolation in 
t-tables can be used to obtain the required percentiles for f d.f. 


Example 2. The mean life of a sample of 9 light bulbs was observed to be 1309 
hours with a standard deviation of 420 hours. A second sample of 16 bulbs chosen 
from a different batch showed a mean life of 1205 hours with a standard deviation 
of 390 hours. Let us test to see whether there is a significant difference between the 
means of the two batches, assuming that the population variances are the same (see 
also Example 10.5.1). 

Here Ho: ш = u2, Hy: ш Æ po, m = 9, п = 16, x = 1309, sı = 420, 
y = 1205, s2 = 390, and let us take a = 0.05. We have 


ges 8(420)? + 15(390)? 
PA 23 
so that 
1 1 8(420)2 + 15(390)2 /1 1 
tm-+n—2,0/25 py x T "iles 123,0.025y/ ee 5 + ET 345.44. 


Since |x — y| = 11309 — 1205| = 104 # 345.44, we cannot reject Но at level 
a = 0.05. 


Quite frequently, one samples from a bivariate normal population with means 
H1, ио, variances c2, оў, and correlation coefficient o, the hypothesis of interest 
being шу = u2. Let (X1, Y1), (X2, Y2), ... , (Xn, Yn) be a sample from a bivariate 
normal distribution with parameters 41, 42, o2, оў, and p. Then X; — Y; is Nay - 
u2, 07), where о? = оў + оў — 2р0102. We can therefore treat D; = (Xj — Yj), 


j=1,2,...,m, as asample from a normal population. Let us write 
о п а; п АКРЕ "n2 
d= Lid - and s2— Lid 4 2) ; 
n n-li 


The following table summarizes the resulting tests: 





Ho Hi 
(do — known constant) Reject H at Level a if: 
4 5, 
І. Ba — ро = do Ш — Шо < do d < dot -h10 
Мп 
- 5, 
П. Ш — B2 < do Ш — Шэ > do а> + ж; 
Jn 


= $, 
Ш. Ш = a = do ш — о #% ld -d| > NULL 
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Remark 8. The case of most importance is that in which do = 0. All the t-tests, 
based on D;’s, are UMP unbiased and UMP invariant. If o is known, one can base 
the test on a standardized normal RV, but in practice such an assumption, is quite 
unrealistic. If л is large, one can replace t-values by the corresponding critical values 
under the normal distribution. 


Remark 9. Clearly, itis not necessary to assume that (X1, У), ... , (Xn, Yn) isa 
sample from a bivariate normal population. It suffices to assume that the differences 
Dj form a sample from a normal population. 


Example 3. Nine adults agreed to test the efficacy of a new diet program. Their 
weights (pounds) were measured before and after the program and found to be as 
follows: 


Participant 
1 2 3 4 5 6 7 8 9 


Before 132 139 126 114 122 132 142 119 126 
After 124 141 118 116 114 132 145 123 121 


Let us test the null hypothesis that the diet is not effective, Ho: ил — u» = ©, 
against the alternative, H1: иі — u2 > 0, that it is effective at level a = 0.01. We 
compute 

4; 8—2+8—2+8+0—3—4+5 18 


E шке шке асана = у, 
4 9 9 


52 = 26.75, and sg = 5.17. 


54 5.17 5.17 
dg + іла = 0+ —t = —— x 2.896 = 4.99 
o + Sa п—1,а + J8 8,0.01 3 


Since d # 4.99, we cannot reject hypothesis Ho that the diet is not very effective. 


PROBLEMS 10.4 


1. The manufacturer of a certain subcompact car claims that the average mileage 
of this model is 30 miles per gallon of regular gasoline. For nine cars of this 
model driven in an identical manner, using 1 gallon of regular gasoline, the mean 
distance traveled was 26 miles with a standard deviation of 2.8 miles. Test the 
manufacturer's claim if you are willing to reject a true claim no more than twice 
in 100. 


2. The nicotine contents of five cigarettes of a certain brand showed a mean of 21.2 
milligrams with a standard deviation of 20.05 milligrams. Test the hypothesis 
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that the average nicotine content of this brand of cigarettes does not exceed 19.7 
milligrams. Use a = 0.05. 


. The additional hours of sleep gained by eight patients in an experiment with a 


certain drug were recorded as follows: 


Patient 1 2 3 4 5 6 7 8 





Hours Gained | 0.7 —1.1 3.4 0.8 20 0.1 —02 3.0 


Assuming that these patients form a random sample from a population of such 
patients and that the number of additional hours gained from the drug is a normal 
random variable, test the hypothesis that the drug has no effect at level a = 0.10. 


. The mean life of a sample of 8 light bulbs was found to be 1432 hours with a 


standard deviation of 436 hours. A second sample of 19 bulbs chosen from a 
different batch produced a mean life of 1310 hours with a standard deviation 
of 382 hours. Making appropriate assumptions, test the hypothesis that the two 
samples came from the same population of light bulbs at level о = 0.05. 


A sample of 25 observations has a mean of 57.6 and a variance of 1.8. A fur- 
ther sample of 20 values has a mean of 55.4 and a variance of 20.5. Test the 
hypothesis that the two samples came from the same normal population. 


. Two methods were used in a study of the latent heat of fusion of ice. Both method 


A and method B were conducted with the specimens cooled to —0.72°C. The 
following data represent the change in total heat from —0.72?C to water, 0°C, in 
calories per gram of mass: 


Method A: 79.98, 80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 
80.03, 80.02, 80.00, 80.02 
Method B: 80.02, 79.74, 79.98, 79.97, 79.97, 80.03, 79.95, 79.97 


Perform a test at level 0.05 to see whether the two methods differ with regard to 
their average performance. (Natrella [73, p. 3-23]) 


. In Problem 6, if it is known from past experience that the standard deviations of 


the two methods are c4 — 0.024 and og — 0.033, test the hypothesis that the 
methods are same with regard to their average performance at level a = 0.05. 


. During World War II bacterial polysaccharides were investigated as blood 


plasma extenders. Sixteen samples of hydrolyzed polysaccharides supplied by 
various manufacturers in order to assess two chemical methods for determining 
the average molecular weight yielded the following results: 


Method A: 62,700; 29,100; 44,400; 47,800; 36,300; 40,000; 43,400; 35,800; 


33,900; 44,200; 34,300; 31,300; 38,400; 47,100; 42,100; 42,200 


Method B: 56,400; 27,500; 42,200; 46,800; 33,300; 37,100; 37,300; 36,200; 


35,200; 38,000; 32,200; 27,300; 36,100; 43,100; 38,400; 39,900 
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Perform an appropriate test of the hypothesis that the two averages are the same 
against a one-sided alternative that the average of method A exceeds that of 
method B. Use о = 0.05. (Natrella [73, p. 3-38]) 


9. The following grade-point averages were collected over a period of 7 years to 
determine whether membership in a fraternity is beneficial or detrimental to 


grades: 
Year 
1 2 3 4 5 6 7 
Fraternity 24 20 23 21 21 20 2.0 


Nonfratemity 24 22 2.5 24 23 148 19 





Assuming that the populations were normal, test at the 0.025 level of significance 
whether membership in a fraternity is detrimental to grades. 


Consider the two-sample t-statistic Т = (X — Y)/{S pv ifm + 1/n], where 
52 = (т – 1)52 T (п – 1)52]/(т +n — 2). Suppose that o, # o2. Let m, n — 


10 


со such that m/(m + 2m — p. Show that under ш = m, Т Les U, where 
U ~ N (0, т?) with д = [(1 — p)o? + po3)/[0? + (1 — р)о2]. Thus when 
т?п, р? 1 and т2 ~ 1, and Т is approximately N (0, 1) as m(* n) — oo. 
In this case, a t-test based on T will have approximately the right level. 


10.5 F-TESTS 
The term F-tests refers to tests based оп an F-statistic. Let X1, X2, ... , Xm and 
Yi, Yo, ... , Y, be independent samples from N(a, оў) and N (u2, o2), respec- 


tively. We recall that 3^7 (X; —X)/o2 ~ x?(m—1) and У (У; -YY!o2 ~ x?(n—1) 
are independent RVs, so that the RV 


ran- 2101-02 ojn-D ois 
| УҢ(#—)? o?(m-1) os 


15 distributed as F(m — 1,n — 1). 
The following table summarizes the F-tests: 


Reject Ho at Level о if: 





Н A, ш, цо Known Hi, H2 Unknown 
2 2 
Xi m 5 
1. оў < of о? > о? Xr ( : dau Y z — Fm,na + = Ел-1 п- 1,0 
iOi- H2) n $5 
2 2 
n 5 
IL o? >a? оў < о} лон) > Ё, т,а Z > Ё-\т—-1е 


У ш)? Gi = ш)? Tm '` s? 
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ETO — иц)? > "Pp ; s? F 
RON ee Aa] — > CTI 

Ш. o=o} oj£oj | BO Сп gi. RON CIE 
ог < у Ктлл-а ог € Ём-ул—1,1—е/2 








Remark 1. Recall (Remark 7.4.5) that 
Fm,n, 1-a = {Fama}! 


Remark 2. The tests described above can easily be obtained from the likelihood 
ratio procedure. Moreover, in the important case where ш, 42 are unknown, tests I 
and П are UMP unbiased and UMP invariant. For test Ш we have chosen equal tails, 
as is customarily done for convenience even though the unbiasedness property of the 
test is thereby destroyed. 


Example 1 (Example 10.4.2 continued). In Example 10.4.2 let us test the 
validity of the assumption on which the t-test was based, namely, that the two pop- 
ulations have the same variance at level 0.05. We compute 52/52 = (420/390)? = 
196/169 = 1.16. Since Fin—in—1,0/2 = F$15,00025 = 3.20, we cannot reject 
Ho: 01 = 02. 


An important application of the F-test involves the case where one is testing the 
equality of means of two normal populations under the assumption that the variances 
are the same, that is, testing whether the two samples come from the same population. 
Let X1, X2,...,Xm and Yi, Yo,... , Yn be independent samples from N (111, o2) 
and N (ua, 02), respectively. If o? = оў but is unknown, the f-test rejects Ho: ш = 
H2 if |T| > c, where с is selected so that a2 = P{|T| > c | ш = u2, 61 = 02}, that 


iS, С = tm4n—2,a2/2Sp/ (1/m + 1/n), where 


2- (т — Ds? + (п — 1)52 
а т+п—2 


> 


$1, 52 being the sample variances. If first an F-test is performed to test o1 = o», 
and then а t-test to test и = ио at levels a; and o», respectively, the probability of 
accepting both hypotheses when they are true is 


P{|T| < с,сү < F «celui = ро, 01 = о}; 


and if F is independent of Т, this probability is (1 — о1)(1 — o2). It follows that the 
combined test has a significance level a = 1 — (1 — о1)(1 — o2). We see that 


a= ay + a2 —-ayan < а +2 
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anda > тах(о у, o2). In fact, a will be closer to оү + o2, since for small a; and a2, 
2102 will be closer to 0. 

We show that F is independent of Т whenever ој = o». The statistic V = 
QG Y, YT Ot — Xy. + Y, — Y)?) is a complete sufficient statistic for the pa- 
rameter (ш, 42, 01 = 02) (see Theorem 8.3.2). Since the distribution of F does not 
depend on у, 42, and оү = 0, it follows (Problem 5) that F is independent of V 
whenever o1 = 02. But T is a function of V alone, so that F must be independent of 
T also. 

In Example 1, the combined test has a significance level of 


a = 1 — (0.95)(0.95) = 1 — 0.9025 = 0.0975. 


PROBLEMS 10.5 


1. For the data of Problem 10.4.4, is the assumption of equality of variances on 
which the t-test is based, valid? 


2. Answer the same question for Problems 10.4.5 and 10.4.6. 


3. The performance of each of two different dive-bombing methods is measured a 
dozen times. The sample variances for the two methods are computed to be 5545 
and 4073, respectively. Do the two methods differ in variability? 


4. In Problem 3, does the variability of the first method exceed that of the second 
method? 


S. Let X = (X1, X5,... , Xn) be a random sample from a distribution with PDF 
(PMF) f(x, 0), 0 € © where Ө is an interval in Rg. Let T(X) be a complete 
sufficient statistic for the family { f (x; 0): Ө є Ө}. If U(X) is a statistic (not a 
function of T alone) whose distribution does not depend on Ө, show that U is 
independent of Т. 


10.6 BAYES AND MINIMAX PROCEDURES 


Let X1, X2,... , X, be a sample from a probability distribution with PDF (РМР) fo, 
0 € Ө. In Section 8.8 we described the general decision problem, namely, once the 
statistician observes x, she has a set A of options available. The problem is to find 
a decision function d that minimizes the risk А(0, 5) = EgL(0, 5) in some sense. 
Thus a minimax solution requires the minimization of max R(0, 5), while a Bayes 
solution requires the minimization of R(z, 6) = ER(0,65), where л is the a priori 
distribution on Ө. In Remark 9.2.1 we considered the problem of hypothesis-testing 
as a special case of the general decision problem. The set A contains two points, ао 
and a1; ag corresponds to the acceptance of Ho: Ө € O9, and a, corresponds to the 
rejection of Ho. Suppose that the loss function is defined by 
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L(0,ao) = a(0) if8cQ;, a(0)-0, 
L(0,a1) = b(0) ifO € 8o,  b(0) > 0, 
L(@, ao) = 0 Ө € Oo, 
L(0,a1) =0 Ө € Ө). 


(1) 


Then 


(2) R(0, 5(X)) = L(0, ao) Po{5(X) = ao} + L(6, ai) Pet8(X) = ai] 


a(0) Ps (6(X) = ао} if0 € ©), 


(3) =), ; 
(0) Po{5(X) = ai) if 0 є Өө. 


A minimax solution to the problem of testing Ho: 0 € O9 against H1: 0 € Өү, 
where Ө = Oo + O4, is to find a rule 6 that minimizes 


max[a(6) Po{5(X) =a}, _ b(60)P&(&(X) = ai]. 


We will consider here only the special case of testing Ho: Ө = 09 against H1: Ө = 
61. In that case we want to find a rule 6 that minimizes 


(4) тах[а Pe, {5(X) = ao}, bPa {5(X) = а}]. 
We will show that the solution is to reject Но if 


So, (х) >k 
fa c 


provided that the constant k is chosen so that 





(5) 


(6) R (80, 8(X)) = R@1, &(X)), 


where ô is the rule defined in (5); that is, the minimax rule ô is obtained if we choose 
k in (5) so that 


(7) a Ро {5(Х) = ао) = b Pe, {6(X) = ai}, 
or, equivalently, we choose k so that 


fo, (X) | fo, X) | 
Р, k} = bP, —— > k}. 
iu px Б (oma | rao 


Let 5* be any other rule. If R(09, ô) < R(@o, 6*), then А(00, 8) = R(01,8) < 
тах[А (00, 6*), R(04, 6*)] and 8* cannot be minimax. Thus R(@, 8) > К(00, 8*), 
which means that 


(8) 





(9) Pa 18" (X) = ai) < Ре,{8(Х) = a1} = Plreject Ho | Ho true}. 
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By the Neyman-Pearson lemma, rule à is the most powerful of its size, so that its 
power must be at least that of 5*, that is, 


Po, (8(X) = a1) > Po, (8*(X) = ay} 
so that 

Po, {8(X) = ао) < Po, {8*(X) = ao}. 
It follows that 

a Po, {5(X) = ao) < a Po, (8* (X) = ao] 
and hence that 
(10) R(6;, d) < К(01, 5*). 
This means that 
тах[ А (00, 8), К(01, 8)1 = R(61, 5) < А(01,8*) 
and thus 
max[R(09, 5), (01, 8)] < max[R(09, 8*), (01, 8*)]. 


Note that іп the discrete case one may need some randomization procedure in 
order to achieve equality in (8). 


Example І. Let X1, X2,... , X, be iid Л(и, 1) RVs. То test Ho: и = po 
against Hi: и = дл (> шу), we should choose k so that (8) is satisfied. This is the 
same as choosing c, and thus К, so that 


aPy, (X < с} = bPa {X >c} 


or 


X-m c-mļ|_ X — uo с-ро 
ap, | ХЕ < ТИС =, п = 17 | 











Thus 
ab[J/n(c — ш)] = 61 — e /n(c ~ uo)ll. 


where Ф is the DF of an M (0, 1) RV. This can easily be accomplished with the help 
of normal tables once we know а, b, шо, ші, and n. 


We next consider the problem of testing Ho: 0 € Oo against иі: 0 є Ө froma 
Bayesian point of view. Let л (0) be the a priori probability distribution on ©. Then 
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(11) R(x, 8) = ЕӨЁ(Ө,8(%)) 
Jo R@, 8)л(0)а0 if z is a PDF, 
Ye RO, 8)х (0) if x is a PME, 
So Ь(Ө)улх (0) Po (6(X) = ay}dO+ 
Jo, a(8)z (0) Pa {8 (X) = ао}а0 if z is a PDF, 


Yo, Р(Ө)л (0) P4(8(X) = а1)+ 
Le, а(9)л(0)Р8(Х) = ap) if x is a PMF. 


The Bayes solution is a decision rule that minimizes К(л, 5). In what follows we 
restrict our attention to the case where both Ho and H have exactly one point each, 
that is, Өр = (69), Ө = {01}. Let zt (69) = ло and x (01) = 1 — ло = x1. Then 


(12) К(л, 8) = bro Poy {5(X) = ai] + алі Po, (9(X) = ао}, 
where b(69) = b, а(0) = a; (a, b > 0). 

Theorem 1. Let X = (X1, X2,... , Xn) be an RV of the discrete (continuous) 
type with PMF (PDF) fo, Ө € © = (69,01). Let zt (09) = ло, 1 (01) =1—лу=л| 
be the a priori probability mass function on ©. A Bayes solution for testing Но: X ~ 


fa, against Ну: X ~ fo,, using the loss function (1), is to reject Но if 


Јо бх) bro 


13 : 
d fe) am 


Proof. We wish to find 6 that minimizes 
R(x, 8) = bro Pa {8 (X) = a1} + ал Po, (5(Х) = ao]. 
Now 


R(x, 8) = EgR(O, 8) 
= E{E9{L@, 5)|X}}, 


so it suffices to minimize Ee(L(0, 5)|X}. 
The a posteriori distribution of 0 is given by 


n (0) fax) 
Yo foGOn (Ө) 
_ x(0)fo(X) 
© mo fo (x) + лү fo, (X) 


(14) Һ(Ө |х) = 
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To fy (X) к 
— ҒӨ = 06, 
" ло fag (X) + лі fo, (X) ; S 
лі fo, (X) ifo —6,. 


ло fa, (X) + л fo, (X) 
Thus 


bh(0oxX), Ө —69,9(X) = ај, 


Eg(L(0, 8(X))|X = x} = в Ө = 61, (X) = ао, 


It follows that we reject Ho, that is, 6(X) = ај if 
bh(09|x) < ah(81|X), 
which is the case if and only if 


bro fa, (x) < ami fo, (X), 


as asserted. 


Remark 1. In the Neyman-Pearson lemma we fixed Pa,(5(X) = a1}, the prob- 
ability of rejecting Ho when it is true, and minimized Po, {8(X) = ао}, the proba- 
bility of accepting Ho when it is false. Here we no longer have a fixed level о for 
Pa (6 (X) = а}. Instead, we allow it to assume any value as long as R(x, 5), defined 
in (12), is minimum. 


Remark 2. WM is easy to generalize Theorem 1 to the case of multiple deci- 
sions. Let X be an RV with PDF (PMF) foe, where Ө can take any of the k values 
61,65, ... , Өх. The problem is to observe x and decide which of the 6;'s is the 
correct value of Ө. Let us write H;: 0 = 6j, i = 1,2,...,k, and assume that 
z(0j) = mi,i = 1,2,...k, у л; = 1, is the prior probability distribution on 
© = (01,05, ... , Ok}. Let 


] if 8 chooses 8j, j # i. 


L(6;,6) = 
(61,9) 0 if 5 chooses 6;. 


The problem is to find a rule 5 that minimizes R(z, 5). We leave the reader to show 
that a Bayes solution is to accept H; : 0 = 6; (i = 1,2,... , k) if 

(15) л: fo (X) = лу fo; (X) for all j Zi, j =1,2,...,k, 

where any point lying in more than one such region is assigned to any one of them. 


Example 2. Let X1, X2,... , X, be iid A (p, 1) RVs. To test Ho: и = ш) 
against Ну: u = pı (> ро), let us take а = b in the loss function (1). Then 
Theorem 1 says that the Bayes rule is one that rejects Ho if 
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fe, (x) * ло 
faa) ~ 1-19" 


that is, 


> 


exp = 1-3 


Уле)? Уло = Ho)? ло 
2 2 


апа 





a n(u2 — u?) ло 
ехр С — uo) Ух + тее > 
1 


| – ло. 


This happens if and only if 


i< 1 lo 1— 

Lys € g[7to/(1 — л0)] E potui 

n ^ n H1 — Ho 2 
where the logarithm is to the base е. It follows that, if ло = 1, the rejection region 
consists of 


Example 3. This example illustrates the result described in Remark 2. Let 
Хі, X2, ... , Xn be a sample from Л/(и, 1), and suppose that yz сап take any one 
of the three values ш, (42, or из. Let wy < u2 < из. Assume, for simplicity, that 
лі = л? = лз. Then we accept H;: и = uj, i = 1, 2, 3, if 


n — n; n ETT 
үер Eze zo ЕС [-} (хк ‚р | 


k=} k=1 
for each j Æi, j = 1, 2, 3. 


It follows that we accept Hj if 


u^ pe а , s; 
(Hi Sepp > 0, j2V2,3 G xi) 


that is, 


(ш: — ni + Hj) 


Thus the acceptance region of Н] is given by 
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yo hth and xm. 


Also, the acceptance region of H» is given by 


z> adu and y< tM 
2 2 
and that of H3 by 
у > КЕЗ апа x2 


In particular, if ш = 0, иә = 2, из = 4, we accept Hi if x < 1, H?if1x x <3, 
and Нз if x > 3. In this case, boundary points 1 and 3 have zero probability, and it 
does not matter where we include them. 


PROBLEMS 10.6 


1. In Example 1, let n = 15, uo = 4.7, and р = 5.2, and choose a = b > 0. Find 
the minimax test and compute its power at u = 4.7 and u = 5.2. 


2. A sample of five observations is taken on a b(1, Ө) RV to test Ho: 0 = і against 
Hy: 6 = 3. 
(a) Find the most powerful test of size a = 0.05. 
(b) FLG, 4) = LG. = 0, LG, 3) = 1, and LG, $) = 2, find the minimax 
tule. 
(c) If the prior probabilities of Ө = } and 0 = 3 аге ло = i and лу = 3, 
respectively, find the Bayes rule. 


3. A sample of size n is to be used from the PDF 
fo(x) =0e°%, x>0, 


to test Ho: 0 = 1 against Н: 0 = 2. If the a priori distribution on 0 is ло = 2, 
лі = i, and a = b, find the Bayes solution. Find the power of the test at Ө = 1 
and Ө = 2. 


4. Given two normal densities with variances 1 and with means —1 and 1, respec- 
tively, find the Bayes solution based on a single observation when a = b and 
(а) to = лү = 4, and (b) ло = lom = 3. 

5. Given three normal densities with variances 1 and with means —1, 0, 1, respec- 
tively, find the Bayes solution to the multiple decision problem based on a single 
observation when тү = 2, mh = 2, лз = {. 


6. For the multiple decision problem described in Remark 2, show that a Bayes 
solution is to accept H;: 0 = 0; (i = 1,2,... , k) if (15) holds. 


СНАРТЕК 11 


Confidence Estimation 


111 INTRODUCTION 


In many problems of statistical inference the experimenter is interested in construct- 
ing a family of sets that contain the true (unknown) parameter value with a specified 
(high) probability. If X, for example, represents the length of life of a piece of equip- 
ment, the experimenter is interested in a lower bound @ for the mean 0 of X. Since 
Ө = 0(X) will be a function of the observations, one cannot ensure with probabil- 
ity | that Ө(Х) < Ө. All that one can do is to choose a number 1 — o that is close to 1 
so that Ро{0(Х) < 0) > 1 — a for all Ө. Problems of this type are called problems of 
confidence estimation. In this chapter we restrict ourselves mostly to the case where 
© C R and consider the problem of setting confidence limits for the parameter Ө. 

In Section 11.2 we introduce the basic ideas of confidence estimation. Sec- 
tion 11.3 deals with various methods of finding confidence intervals, while Sec- 
tion 11.4 deals with shortest-length confidence intervals. In Section 11.5 we study 
unbiased and equivariant confidence intervals. 


11[. SOME FUNDAMENTAL NOTIONS OF 

CONFIDENCE ESTIMATION 
So far we have considered a random variable or some function of it as the basic 
observable quantity. Let X be an RV, and a, b be two given positive real numbers. 


Then 


Pla« X « b) = P{a < Xand X < b} 


=p (Z > band X <b} 
a 
=р{х<ь< =}, 

а 


and if we know the distribution of X and a,b, we can determine the probability 
P{a < X < b}. Consider the interval /(Х) = (X, БХ/а). This is an interval with 
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endpoints that are functions of the RV X, and hence it takes the value (x, bx/a) 
when X takes the value x. In other words, Z (X) assumes the value 7 (x) whenever X 
assumes the value x. Thus / (X) is a random quantity and is an example of a random 
interval. Note that 7 (X) includes the value b with a certain fixed probability. For 
example, if b = 1, a = 1 and X is U (0, 1), the interval (X, 2X) includes point 1 with 
probability 1. We note that /(Х) is a family of intervals with associated coverage 
probability P(I(X) 5 1) = 1. It has (random) length /(7(X)) = 2X — X = X. In 
general, the larger the length of the interval, the larger the coverage probability. Let 
us formalize these notions. 


Definition 1. Let Pg, 0 € © C Rp, be the set of probability distributions of 
an RV X. A family of subsets S(x) of ©, where S(x) depends on the observation x 
but not on Ө, is called a family of random sets. ЇЇ, in particular, Ө С R and S(x) 
is an interval (@(x), Ө(х)), where Ө(х) and Ө(х) are functions of x alone (and not 
0), we call S(X) a random interval with @(X) and 0(X) as lower and upper bounds, 
respectively. 0(X) may Бе —oo, and 0 (X) may be +00. 


In a wide variety of inference problems, one is not interested in estimating the 
parameter or testing some hypothesis concerning it. Rather, one wishes to establish 
a lower or an upper bound, or both, for the real-valued parameter. For example, if X 
is the time to failure of a piece of equipment, one may be interested in a lower bound 
for the mean of X. If the RV X measures the toxicity of a drug, the concern is to find 
an upper bound for the mean. Similarly, if the RV X measures the nicotine content 
of a certain brand of cigarettes, one may be interested in determining an upper and a 
lower bound for the average nicotine content of these cigarettes. 

In this chapter we are interested in the problem of confidence estimation, namely, 
that of finding a family of random sets S(x) for a parameter Ө such that for a given 
2,0 <a < 1 (usually small), 


(1) Pe{S(X) эө} > 1-а for all 0 є Ө. 
We restrict our attention mainly to the case where 0 є Ө C R. 

Definition 2. Let € Ө C R and 0 <a < 1. A function @(X) satisfying 
(2) Pj40(X)x0)21—o  forall@ 
is called a lower confidence bound for 0 at confidence level 1 — o. The quantity 
(3) ini Po (&(X) < 0) 
is called the confidence coefficient. 


Definition 3. A function 0 that minimizes 


(4) Po{O(X) < 0’} for all 6’ < 6 
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subject to (2) is known as a uniformly most accurate (UMA) lower confidence bound 
for 0 at confidence level 1 — o. 


Remark 1. Suppose that X ~ Pg and (2) holds. Then the smallest probability of 
true coverage, Р{0(%) < Ө) = Pe([8(X), oo) э 0] is 1 — a. The probability of 
false (or incorrect) coverage is Po([8(X), oo) э 0} = Ре{Ө(Х) < 0) for a’ < Ө. 
According to Definition 3, among the class of all lower confidence bounds satisfying 
(2), a UMA lower confidence bound has the smallest probability of false coverage. 


Similar definitions are given for an upper confidence bound for Ө and a UMA 
upper confidence bound. 


Definition 4. A family of subsets S(x) of © C R, is said to constitute a family 
of confidence sets at confidence level 1 — o if 


(5) Рө{5(Х) э 0) > 1-а for all 0 € ©, 


that is, the random set S(X) covers the true parameter value Ө with probability 
> 1 — о. A lower confidence bound corresponds to the special case where k = 1 and 


(6) S(x) = (6: (x) < 0 < оо}; 

and an upper confidence bound to the case where 

(7) S(x) = (8: O(x) > 0 > —oo]. 
If S(x) is of the form 

(8) Sœ) = @(x), Ө(х)) 


we will call it a confidence interval at confidence level 1 — a, provided that 


(9) Ро\0(Х) <0 «0(X) 21—a га, 
and the quantity 
(10) inf P519(X) < 0 < 6(X)} 


will be referred to as the confidence coefficient associated with the random interval. 


Remark 2. We write S(X) э Ө to indicate that X, and hence S(X), is random 
here and not Ө, so the probability distribution referred to is that of X. 


Remark 3. When X = x is the realization, the confidence interval (set) S(x) is 
a fixed subset of Rg. No probability is attached to S(x) itself since neither Ө nor 
S(x) has a probability distribution. In fact, either S(x) covers 8 or it does not, and 
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we will never know which since @ is unknown. One can give a relative frequency 
interpretation. If (1 —o)-level confidence sets for Ө were computed a large number of 
times, a fraction (approximately) 1 —o of these would contain the true (but unknown) 
parameter value. 


Definition 5. A family of (1—a)-level confidence sets (S(x)] is said to be a UMA 
family of confidence sets at level 1 — a if 


Ро{5(Х) contains 0') < Ре{5' (Ж) contains 0'] 
for all Ө 4 6’ and any (1 — o)-level family of confidence sets S’ (X). 

Example 1. Let X1, X2, ... , X, be iid RVs, X; ~ N (u, 0?). Consider the in- 
terval (X — cy, X + сә). In order for this to be a (1 — a)-level confidence interval, 
we must have 

P(X—-c«pn«X-4cc)21-—a, 
which is the same as 
Plu-ao<XK <и -+сү} > 1—@. 


Thus 





o 


P2 ПЕ 20р PIE 
c o 


Since /n(X — u)/o ~ N (0, 1), we can choose c; and c? to have equality, namely, 





с X- c 
|2,9 аа oca 
с o o 
provided that с is known. There are infinitely many such pairs of values (c1, c2). In 
particular, an intuitively reasonable choice is c = —c2 = c, say. In that case 
суп 
Mc = Zaj 2, 


and the confidence interval is (X — (o/J/n)za/2, X + (0/A/n)zaj2). The length of 
this interval is (20/./n)zq/2. Given с and о, we can choose n to get a confidence 
interval of a fixed length. 

If o is not known, we have from 


P(-3«X-uc«c)zl-a 
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that 


Siu < 


and once again we can choose pairs of values (c;, c2) using a t-distribution with n—1 
d.f. such that 


j| rette] 





p|- < Xu og xd = 1 ~ о. 


$ $ S 
In particular, if we take с = —c2 = c, say, then 
„п 
id = tn-1,a/2, 


and (X—(S/./1)tn—1,0/2, X +(S//1)tn—1,02/2), is a (1—a)-level confidence interval 
for u. The length of this interval is (25 / nts 1,12, which is no longer constant. 
Therefore, we cannot choose n to get a fixed-width confidence interval of level 1 — o. 
Indeed, the length of this interval can be quite large if o is large. Its expected length 


is 
2 2 2 Г(п/2) 
—— ae Es S = —t4- Г[(п—1)/2]? 


which can be made as small as we please by choosing n large enough. 


Example 2. їп Example 1, suppose that we wish to find a confidence interval for 
o? instead when p is unknown. Consider the interval (ci 52, с25?), с, с2 > 0. We 
have 


P{c S? <0? < с252} > 1-а, 


so that 
52 
-1 ~1 
|а <ac ЕЕ 


Since (n — 1)S?/o? is x?(n — 1), we can choose pairs of values (c1, c2) from the 
tables of the chi-square distribution. In particular, we can choose c1, c? so that 


2 2 
т „1 zsp pers 
o? су 2 02 — c 


n— 1 2 == 1 
= Xn-1,0/2 and 








I 2 
m = Xn—1,1—a/2- 
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(s — 1)52 (n—1)S? ) 

лд S P жа et 

Xn—1,a/2 Хп-1,1-а/2 

is a (1 — o)-level confidence interval for o? whenever u is unknown. If р is known, 
then 


Thus 


^ (X; и) 
25% ~ x^). 
1 


Thus we can base the confidence interval оп Ei (Xi — u}. Proceeding similarly, we 
get a (1 — o)-level confidence interval as 


(Hige uy. Sas 5 | 


к 2 
Xan Xn,1—a/2 


Next suppose that both u and o? are unknown and that we want a confidence set 
for (u, 07). We have from Boole's inequality 


205 cot = "DS | 


NE S 5 
P { X — eh Lay/2 < U < X + —=tn—1,01/2 2 
| vn vn x 1,0)/2. Xn—1,1—a2/2 


ux S cay 
z1-P |х + ee lm <porx — See > | 


X 2 E 2 
— Р E 27 « o? or ewe 22 > | 
Xn—1,1—o2/2 Xn-1,0/2 


-1—20-—2, 
so that the Cartesian product, 


NE. me^ (п —1)532 (п – 1)52 
5(Х) = (х — =In-1,0/2, X + "E x (= e naar 
Уп Vn Xn—1,a2/2 Xn—1,1—o5/2 


is a (1 — a — o)-level confidence set for (u, 07). 


11.3 METHODS OF FINDING CONFIDENCE INTERVALS 


We now consider some common methods of constructing confidence sets. The most 
common of these is the method of pivots.. 
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Definition 1. Let X ~ Pg. A random variable T (X, 0) is known as a pivot if the 
distribution of T (X, Ө) does not depend on Ө. 


In many problems, especially in location and scale problems, pivots are easily 
found. For example, in sampling from f (x — 0), X(n) — 0 is a pivot and so is X — 9. 
In sampling from (1/o) f (x /o), a scale family, X(4)/o is a pivot and so is Xq)/o, 
and in sampling from (1/o) f ((x — 0)/o), a location-scale family, (X — 0)/5, іѕ а 
pivot, and so is (Хү) + Xa) — 20)/5. 

If the DF Fo of Xj is continuous, then Fa(X;) ^ U[O, 1] and, in case of random 
sampling, we can take 


T(X, 6) =| | 0х0), 


i=} 
or 


n 
— log T (X, 0) = – У log Fo(Xi) 


і=1 


as a pivot. Since Fo(X;) ~ U[0, 1], — log Fa(X;)  G(1, 1) and — У? у log Fa(Xi) 
~ G(n, 1). It follows that — ? 7 ., log Fg (Xj) is a pivot. 

The following result gives a simple sufficient condition for a pivot to yield a con- 
fidence interval for a real-valued parameter 0. 


Theorem 1. Let T (X, Ө) be a pivot such that for each 0, T (X, 0) is a statistic, 
and as a function of Ө, T is either strictly increasing or decreasing at each x € Ry. 
Let A C R be the range of T, and for every А € A and x є Rp, let the equation 
А = T(x, 0) be solvable. Then one can construct a confidence interval for Ө at any 
level. 


Proof. LetO < о < 1. Then we can choose a pair of numbers А (0) and A2(a) 
in A not necessarily unique such that 


(1) Ро(А (а) < T(X,0) «25(0)) > 1— o for all Ө. 


Since the distribution of T is independent of Ө, it is clear that А and 4? are indepen- 
dent of Ө. Since, moreover, T is monotone in Ө, we can solve the equations 


(2) Т(х,0) = Xi(o) and 7(х,0) = А2(0) 
for every x uniquely for Ө. We have 
(3) Po{O(X) «0 <6(X)}>1—a@ — forall6, 


where 0(X) < Ө(Х) are RVs. This completes the proof. 
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Remark 1. The condition that А = T(x, Ө) be solvable will be satisfied if, for 
example, Т is continuous and strictly increasing or decreasing as a function of Ө 
in Ө. 


Note that in the continuous case (that is, when the DF of T is continuous) we can 
find a confidence interval with equality on the right side of (1). In the discrete case, 
however, this is usually not possible. 


Remark 2. Relation (1) is valid even when the assumption of monotonicity of 7 
in the theorem is dropped. In that case, inversion of the inequalities may yield a set 
of intervals (random set) 5(Х) in © instead of a confidence interval. 


Remark 3. Тһе argument used in Theorem 1 сап be extended to cover the multi- 
parameter case, and the method will determine a confidence set for all the parameters 
of a distribution. 


Example 1. Let X1, X2,...,Xn ~ Ми, c2), where с is unknown and we seek 
a (1 — o)-level confidence interval for д. Let us choose 


X-— 
ca. 





TX, ш) = 


where X, S? are the usual sample statistics. The RV T(X, и) has Student's t- 
distribution with n — 1 d.f., which is independent of и and T (X, и), as a function 
of и. is monotone. We can clearly choose А (о), A2(a) (not necessarily uniquely) so 
that 


Р{А1(0) < T(X, и) < Az(@)} = 1-а for all џи. 
Solving 


Х-и 


№ (0) = Мп, 





we get 


aot rd ал yF S 
Ш) = Х – же aX) = X — ua ues 


and a (1 — a)-level confidence interval is 


— S — S 
(x — aoe X — Te) i 


In practice, one chooses àa (œ) = —A1(@) = th-1,0/2- 
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Example 2. Let Х\, Хә,... , X, be iid with common PDF 
fox) == expl - (x — 9)}, х> Ө and Oelsewhere. 


Then the joint PDF of X is 
n 
f (x; 0) = exp (- м -+ ~) Tixq)>0]- 
i= 
Clearly, T (X, Ө) = Хү — 0 is a pivot. We can choose А (œ), 42(@) such that 
Po {Ai(@) < Xa — 0 < А(0)| = 1-а foralló 
which yields (X (jj —A2(a), Xa) —A1 (@)) as a (1 —@)-level confidence interval for Ө. 


Remark 4. In Example 1 we chose А = —24, whereas in Example 2 we did 
not indicate how to choose the pair (A1, Аз) from an infinite set of solutions to 
Po {ài (œ) < T(X, 0) < A2(a)} = 1—a@. One choice is the equal-tails confidence in- 
terval, which is arrived at by assigning probability a /2 to each tail of the distribution 
of T. This means that we solve 


2 = Po{T(X, 0) < Ay} = P{T(X, Ө) > dd}. 


In Example 1, symmetry of the distribution leads to the choice indicated. In Ex- 
ample 2, Y = Xa) — Ө has PDF 


g(y) = nexp(—ny) югу > 0 
so we choose (5.1. А2) from 
а 
Po [Xo —0« №} = 5 = Ро {Хеу – 0 > àz}, 

giving Ло (а) = (1/n)ln(æ/2), and A; (à) = —(1/n) In(1—a@/2). Yet another method 
is to choose А, А2 in such a way that the resulting confidence interval has smallest 
length. We discuss this method in Section 11.4. 

We next consider the method of test inversion and explore the relationship be- 
tween a test of hypothesis for a parameter Ө and confidence interval for Ө. Consider 


the following example. 


Example 3. Let X1, X2,... , X, be a sample from N (р, od) where со is known. 
In Example 11.2.1 we showed that 


= 1 = 1 
(x = 754/290. X+ peno) 
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is a (1 — o)-level confidence interval for jz. If we define a test р that rejects a value 
of u = ио if and only if ию lies outside this interval; that is, if and only if 


Jn |X — nol 2 
wy Beh 


then 


X — mol 
Pa [iH | n 


and the test д is a size o test of и = цо against the alternatives и Æ ио. 

Conversely, a family of o-level tests for the hypothesis и = ио generates a family 
of confidence intervals for џи by simply taking, as the confidence interval for 140, the 
set of those u for which one cannot reject y = uo. 

Similarly, we can generate a family of a-level tests from а (1 — @)-level lower 
(or upper) confidence bound. Suppose that we start with the (1 — o)-level lower 
confidence bound X — zy (o0/ A/n) for u. Then, by defining a test p(X) that rejects 
и < по if and only if uo < X — zo (00/ n), we get an a-level test for a hypothesis 
of the form и < uo. 


Example 3 is a special case of the duality principle proved in Theorem 2 below. 
Tn the following we restrict attention to the case in which the rejection (acceptance) 
region of the test is the indicator function of a (Borel-measurable) set, that is, we 
consider only nonrandomized tests (and confidence intervals). For notational conve- 
nience we write Но(00) for the hypothesis Ho: Ө = 09 and H; (00) for the alternative 
hypothesis, which may be one- or two-sided. 


Theorem 2. Let A(09), 09 є ©, denote the region of acceptance of an a-level 
test of Ho(09). For each observation x = (x1, x2, ... , Xn), let S(x) denote the set 


(4) 5(х) = (0:x € А(Ө),Ө € ©}. 


Then S(x) is a family of confidence sets for 0 at confidence level 1 — о. If, moreover, 
A(09) is UMP for the problem (a, Ho(09), H1(60)), then S(X) minimizes 


(5) Po{S(X) э 0°} for all Ө є H1(0^) 
among all (1 — o)-level families of confidence sets. That is, SCX) is UMA. 
Proof. We have 


(6) S(x) эө if and only x € A(0), 
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so that 
Po{S(X) > 0} = РХ є A(0)) > l-a, 

as asserted. 

If S*(X) is any other family of (1 — o)-level confidence sets, let А*(0) = 
(x: S*(x) э 0). Then 

Po{X € A*(0)) = Po{S*(X) > 6} = 1—o5 
and since А (00) is UMP for (о, Ho(09), Hı (60)), it follows that 
Po{X є A*(09)) = Ро{Х є A(09)] for any Ө є Hi(09). 
Hence 
Po (S* (X) э 69] = Po{X € A(69)] = Po(S(X) э 69] 

for all Ө є Hı (09). This completes the proof. 


Example 4. Let X be an RV of the continuous type with one-parameter exponen- 
tial PDF given by 


fo (x) = exp[Q@)T (x) + S'(x) + D(6)], 


where Q(0) is a nondecreasing function of 0. Let Но: Ө = 09 and H1: Ө < 09. Then 
the acceptance region of a UMP size a test of Ho is of the form 


A(69) = (x: T(x) > c(09)]. 
Since for 0 > 0’, 
Po {T (X) < c(6’)} = а = Ро{Т(Х) < c(0)) < Po {T(X) < c(0)), 


c(@) may be chosen to be nondecreasing. (The last inequality follows because the 
power of the UMP test is at least о, the size.) We have 


S(x) = (0: x e A(0)), 


so that S(x) is of the form (—oo, c^! (T (x))) or (—оо, c^! (T (x))], where c^! is 
defined by 


c (T (х) = 50р{0: с(Ө) < Т(х)}. 
ө 


In particular, if X1, X2, ... , Xn is a sample from 
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1 X 
207х198 
$e t , x > 0, 
0, otherwise, 


then T(x) = Уд xi, and for testing Но: Ө = 09 against Hi: Ө < 0o, the UMP 
acceptance region is of the form 


A(89) = | Jaz «e. 
i-i 


where c(6p) is the unique solution of 





оо qus 
Í e? ду = 1-а, О<о < 1. 
c(6)/8 Xt — D! 


The UMA family of (1 — o)-level confidence sets is of the form 
S(x) = (0: x e A(0)). 


In the case n — 1, 


1 x 
69) = 091 —— а S(x) = |0, ———— |. 
co) = Mtoe (=) ami з) | TE 
Example 5. Let Xj, X2,... , Xn be iid U (0, 0) RVs. In Problem 9.4.3 we asked 
the reader to show that the test 


ет. сл. 
is UMP size о test of Ө = 69 against Ө 3 09. Then 
A(8p) = (x: 69a! /^ < xq) < 69) 
and it follows that [x(n), Xima! nj is a (1 —a)-level UMA confidence interval for 0. 


The third method we consider is based on Bayesian analysis, where we take into 
account any prior knowledge that the experimenter has about 0. This is reflected in 
the specification of the prior distribution x (0) on ©. Under this setup the claims of 
probability of coverage are based not on the distribution of X but on the conditional 
distribution of 0 given X — x, the posterior distribution of 0. 

Let © be the parameter set, and let the observable RV X have PDF (РМР) fo (х). 
Suppose that we consider Ө as an RV with distribution 7 (Ө) on Ө. Then fg(x) can be 
considered as the conditional PDF (PMF) of X, given that the RV 0 takes the value Ө. 
Note that we are using the same symbol for the RV @ and the value that it assumes. 
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We can determine the joint distribution of X and Ө, the marginal! distribution of X, 
and also the conditional distribution of Ө, given Ж = x as usual. Thus the joint 
distribution is given by 


(7) f(x, 0) = n(0) fo(x), 
and the marginal distribution of X by 


У`л(0) fo(x) if x is a PMF, 


(8) 800 — @)folx)de ifm isa PDF. 


The conditional distribution of Ө, given that x is observed, is given by 


_ 20) foQo 


9 h(0 
(9) (0 | x) 2) 


А g(x) > 0. 


Given h(@ | x), it is easy to find functions /(x), u(x) such that 
РКХ) «0 «u(X)) > 1—о, 
where 


еле 1:06, 
(10) РИХ) «6 < u(X) |X =x} = | |», 


depending on whether h is a PDF or a PMF. 


Definition 2. An interval (/(x), u(x)) that has probability at least 1 —o of includ- 
ing Ө is called a (1 — о)-/еуе/ Bayes interval for Ө. Also, L(x) and u(x) are called the 
lower and upper limits of the interval. 


One can similarly define one-sided Bayes intervals or (1 — o)-level lower and 
upper Bayes limits. 


Remark 5. We note that under the Bayesian setup, we can speak of the probabil- 
ity that Ө lies in the interval (/(x), u(x)) with probability 1 — o because / and и are 
computed based on the posterior distribution of 0 given x. To emphasize this distinc- 
tion between Bayesian and classical analysis, some authors prefer the term credible 
sets for Bayesian confidence sets. 


Example 6. Let X1, X2,... , X, be iid Ми, 1), и € R, and let the a priori 
distribution of u be A/(0, 1). Then from Example 8.8.6 we know that A(t | x) is 


x (225. р 
n+l п+1 
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Thus a (1 — o)-level Bayesian confidence interval is 


(= NET: nx + 5802 ) 
ntl yn+i n+1 ntl’ 


А (1 —q@)-level confidence interval for u (treating u as fixed) is a random interval 
with value 











lo 0/2 — _ 20/2 
(s Teo ot), 


Thus the Bayesian interval is somewhat shorter in length. This is to be expected since 
we assumed more in the Bayesian case. 


Example 7. Let X1, X2,... , Xn be iid b(1, p) RVs, and let the prior distribution 
on © = (0, 1) be U (0, 1). A simple computation shows that the posterior PDF of p, 
given x, is 


piis -p iin 


h(plx) = в (Ух: +1,п- Ух +1) 
0, otherwise, 


О<р<1 


Given a table of incomplete beta integrals and the observed value of 3 xj, one 
can easily construct a Bayesian confidence interval for p. 


Finally, we consider some large-sample methods of constructing confidence in- 
tervals. Suppose that T (X) ~ AN(@, v(0)/n). Then 


T(X) —8 Ly 
vO) ; 


where Z ~ М(0, 1). Suppose further that there is a statistic S(X) such that 
S(X) > v(6). Then, by Slutsky's theorem, 


Jn 


T(X)—-9 L 
Mu. MG 


and we can obtain an (approximate) (1 — o)-level confidence interval for 0 by invert- 
ing the inequality 
T(X)—0 | 2 
ЕССЕ" 64 Va & 
$e |- 


Example 8. Let X1, X2, ... , Xn be iid RVs with finite variance. Also, let E X; = 
p and EX? = о? + џ?. From the CLT it follows that 


Um 
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Ў 
KL 7, 
о/,/п 


where Z ~ Л/(0О, 1). Suppose that we want a (1 — @)-level confidence interval for 


и when ø is not known. Since S EUN с, for large n the quantity [./n(X — 2)/S] is 
approximately normally distributed with mean 0 and variance 1. Hence, for large n, 
we can find constants сі, c2 such that 


X= 
pfa < 


In particular, we can choose —c = c2 = za? to give 








esa nre 


5 RY 
X — —=Za/2, X + —=1 
(rc eters B+ te) 
as an approximate (1 — o)-level confidence interval for и. 


Recall that if 6 is the MLE of 6 and the conditions of Theorem 8.7.4 or 8.7.5 are 
satisfied (caution: see Remark 8.7.4), then 


— NO, 1) asn — oo, 


J/n(8 —8) 1 
o 


where 


-1 
о? = Е [CERT log fo(X) | = _1_ 
30 I1(8) 


Then we can invert the statement 





[n - CERO ziza 


to give an approximate (1 — o)-level confidence interval for Ө. 

Yet another possible procedure has universal applicability and hence can be used 
for large or small samples. Unfortunately, however, this procedure usually yields 
confidence intervals that are much too large in length. The method employs the well- 
known Chebychev inequality (see Section 3.4): 


P [ix — EX| < ev var) | due F 


If Ê is an estimate of Ө (not necessarily unbiased) with finite variance o ?(0), then by 
Chebychev's inequality 
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. 5 1 
P fiô- «s E - oy] eec 
€ 


It follows that 
(9 — ey E(Ê – 0)2,0 + ey EO — ө) 


isa[1 — (1 /e?)]-level confidence interval for Ө. Under some mild consistency con- 


ditions one can replace the normalizing constant y [E (6 — 6)2], which will be some 


function 4(9) of Ө, by А). 
Note that the estimator 0 need not have a limiting normal law. 


Example 9. Let X1, X2,... , X, be iid b(1, p) RVs and it is required to find a 
confidence interval for p. We know that E X = р, and 





var(X) — p(l— p) 
п п ` 


= i 1 
p|- Png 
n Е 
1. 


Since p(1 — p) < 4, we have 


var(X) = 


It follows that 


= 1 = 1 1 
РХ — —e < p< X + —e} > 1——у. 
| DW SP эп | g 
One can now choose e and n or, if n is kept constant at a given number, є to get 
the desired level. 
Actually, the confidence interval obtained above can be improved somewhat. We 


note that 
> [ра р) 1 
P iX — ————— l- =, 
| р| < с |> E 

so that 
y» „2 EPU- р) 1 
РАХ р a pre, 
п E 

Now 


2 
— E€ 
IX — pl’ < —р(1— р) 
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2 2 
= E =2 
(1.2) - (m oe « 0. 


This last inequality holds if and only if p lies between the two roots of the quadratic 


equation 
2 2 
( + 2) р? — (x+ 2) p+X =0. 
n n 
The two roots are 


_ 2X + 6?/n) - y 2X + (е2/п)р — 4[1 + (Е?/п)]Х" 


if and only if 





P 2[1 + (£2/n)] 
X (e2/n) — /4c€ /п)Х@ — X)  (e*/n2) 
Еее Ы 201 + (є?/п)] 
апі 
2X + (Е?/пу + DX + (е?/п)Ё — AD + G2/]X 
сй 2[1 + (Е7/п)] 
X (є?/п) + ү 4(?/n)X(1 — X) + (e*/n?) 
© 1+ (€2/n) И 201 + (22/п)] 
It follows that 


1 
Pipi <р < po} >l- 


Note that when л is large, 


= Ixa-x E IXa-x 
ру ®Х—є MS. р АХ + a. 


as one should expect in view of the fact that X —> р with probability 1 and 


y [X(1 — X)/n] estimates /[р(1 — p)/n]. Alternatively, we could have used the 
CLT (or large-sample property of the MLE) to arrive at the same result but with € 
replaced by 202. 


Example 10. Let X1, X2,..., X, bea sample from U(0, Ө). We seek a confi- 
dence interval for the parameter Ө. The estimator Ө = Хп) is the MLE of 0, which 
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is also sufficient for Ө. From Example 5, [ Хп), @7!/" X(,)] is a (1 — @)-level UMA 
confidence interval for 0. 

Let us now apply the method of Chebychev's inequality to the same problem. We 
have 


n 
НОЕ рт 


апі 


2 


942.92 | 
Boy — 8 — 8 10042 


Thus 


Хо) = 01 [n+ D +2) 1 
pape рл ш лышыл Арсар 
| 8 2 ster e? 


Since Xin) is Ө, we replace Ө by Xm) in the denominator, and for moderately 


latge n, 
Р Xn) — 0| аа а у к 
Xin) 2 e? 
It follows that 
Ji 8 
Xim — €X@) Xi) Е Х(пу 
| Oe Тн + O Va + Тл +2) 


isal — (1/2) confidence interval for Ө. Choosing 1 — (1/2) = 1 о, ore = 


1/./a, and noting that 1/4/T(n + 1)(n + 2)] ~ 1/n for large n, and the fact that 
with probability 1, Xj < 0, we can use the approximate confidence interval 


1 /2 
(xo. Хо) ( E i 2)) 


In the examples given above we see that for a given confidence interval 1 — a, а 
wide choice of confidence intervals is available. Clearly, the larger the interval, the 
better the chance of trapping a true parameter value. Thus the interval (оо, +00), 
which ignores the data completely, will include the real-valued parameter 0 with 
confidence level 1. However, the larger the confidence interval, the less meaningful 
itis. Therefore, for a given confidence level 1 — о, it is desirable to choose the shortest 


for Ө. 
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possible confidence interval. Since the length 6 — Ө, in general, is а random variable, 
one can show that a confidence interval of level 1—a@ with uniformly minimum length 
among all such intervals does not exist in most cases. The alternative, to minimize 
E9(6 — Ө), is also quite unsatisfactory. In the next section we consider the problem 
of finding shortest-length confidence interval based on a suitable statistic. 


PROBLEMS 11.3 


1. 


10. 


A sample of size 25 from a normal population with variance 81 produced a mean 
of 81.2. Find a 0.95 level confidence interval for the mean и. 


. Let X be the mean of a random sample of size n from N (u, 16). Find the small- 


est sample size n such that (X —1, X 4-1) isa 0.90 level confidence interval for H. 


. Let X1, X2,... , Xm апа Yi, Y2, ... , Y, be independent random samples from 


N (u1, о?) and N (u2, о?), respectively. Find a confidence interval for ш — 2 
at confidence level 1 — = when (а) o is known, and (b) с is unknown. 


. Two independent samples, each of size 7, from normal populations with com- 


mon unknown variance с? produced sample means 4.8 and 5.4 and sample 
variances 8.38 and 7.62, respectively. Find a 0.95-level confidence interval for 
р — H2, the difference between the means of samples 1 and 2. 


. In Problem 3, suppose that the first population has variance о? and ће second 


population has variance of, where both оў, and c2 are known. Find a (1 — о)- 
level confidence interval for j£; — u2. What happens if both оў апа о2 аге 
unknown and unequal? 


. In Problem 5, find a confidence interval for the ratio оў Jot both when 41, шо 


are known and when и, рә are unknown. What happens if either и or 442 is 
unknown but the other is known? 


. Let X1, X2,... , X, be a sample from a G(1, В) distribution. Find a confidence 


interval for the parameter В with confidence level 1 — о. 


. (a) Use the large-sample properties of the MLE to construct a (1 — o)-level 


confidence interval for the parameter Ө in each of the following cases: 
(i) Xi, X2,... , Xn is a sample from G(1, 1/0), and (ii) X1, X2, ... , Xn is 
a sample from P (0). 


(b) In part (a), use Chebychev's inequality to do the same. 


For a sample of size 1 from the population 


folx) = 0-2), 0<x <8, 


find a (1 — a)-level confidence interval for Ө. 


Let X1, X2, ... , X, be a sample from the uniform distribution on N points. Find 
an upper (1 —o)-level confidence bound for №, based on max(Xi, X2, ... , X4). 
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11. In Example 10, find the smallest n such that the length of the (1 — o)-level 
confidence interval (Xn), o !/" X(,) < d, provided it is known that Ө < a, 
where a is a known constant. 


12. Let X and Y be independent RVs with PDFs Ae~** (х > 0) and ue" (y > 0), 
respectively. Find a (1 — o)-level confidence region for (A, и) of the form 
{(A, u): XX + uY < К). 


13. Let X1, X2, ... , X, be a sample from A (u, о?), where o? is known. Find a 
UMA (1 — @)-level upper confidence bound for и. 


14. Let X4, X2, ... , X, bea sample from a Poisson distribution with unknown pa- 
rameter A. Assuming that А is a value assumed by a G (o, 8) RV, find a Bayesian 
confidence interval for А. 


15. Let X1, X2, ... , X, bea sample from a geometric distribution with parameter 
0. Assuming that 0 has a priori PDF that is given by the density of a B(o, 8) 
RV, find a Bayesian confidence interval for 0. 


16. Let X1, X2,..., X, be a sample from Л (џи, 1), and suppose that the a priori 
PDF for u is U (—1, 1). Find a Bayesian confidence interval for и. 


11.4. SHORTEST-LENGTH CONFIDENCE INTERVALS 


We have already remarked that we can increase the confidence level simply by taking 
a longer-length confidence interval. Indeed, the worthless interval —oo < 0 < oo, 
which simply says that Ө is a point on the real line, has confidence level 1. In prac- 
tice, one would like to set the level at a given fixed number 1 — a (0 < œ < 1) and, 
if possible, construct an interval as short as possible among all confidence intervals 
with the same level. Such an interval is desirable since it is more informative. We 
have already remarked that shortest-length confidence intervals do not always exist. 
In this section we investigate the possibility of constructing shortest-length confi- 
dence intervals based on simple RVs. The discussion here is based on Guenther [34]. 
Theorem 11.3.1 is really the key to the following discussion. 

Let Xj, X2, ... , X, beasample from a PDF fo(x), and T(X1, X2,... , X4,0) = 
То be a pivot for Ө. Also, let 4; = Ау (œ), Az = A2(@) be chosen so that 


(1) Pid, < Tg < Ag} =1-a, 
and suppose that (1) can be rewritten as 
(2) Р{@(Х) «8 «06(X)) = 1-а. 


For every Tg, Ај and Аз can be chosen in many ways. We would like to choose 
Ху and Az so that 8 — Ө is minimum. Such an interval is a (1 — o)-level shortest- 
length confidence interval based on Тә. It may be possible, however, to find another 
RV T% that may yield an even shorter interval. Therefore, we are not asserting that 
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the procedure, if it succeeds, will lead to a (1 — a)-level confidence interval that has 
shortest length among all intervals of this level. For Tg we use the simplest RV that 
is a function of a sufficient statistic and 0. 


Remark 1. Ап alternative to minimizing the length of the confidence interval 
is to minimize the expected length Ee(0(X) — @(X)}. Unfortunately, this also is 
quite unsatisfactory since, in general, there does not exist a member of the class of 
all (1 — «)-level confidence intervals that minimizes Eg(0(X) — 0(X)) for all Ө. 
The procedures applied in finding the shortest-length confidence interval based on a 
pivot are also applicable in finding an interval that minimizes the expected length. We 
remark here that the restriction to unbiased confidence intervals is natural if we wish 
to minimize Eg[0 (X) — 0(x)]. See Section 11.5 for definitions and further details. 


Example 1. Let X1, X2, ... , Xn be sample from A (и, a7), where o? is known. 
Then X is sufficient for p and take 


X-u 


ofyn 





Ty (X) = 


Then 





Ja Ji 


The length of this confidence interval is (o/./n)(b — a). We wish to minimize L = 
(о /./n)(b — a) such that 


Mu = is 
ЕЕ еее иаа). 
п 


1 b b 
Ф(Б) — Ф(а) = Jm e qx = f ф(х)ах =1—a. 
а а 


Here y and Ф, respectively, are the PDF and DF of an A (0, 1) RV. Thus 


dL а (db А 
da „n Хаа 


апа 
db 
g(b) = g(a) = 0, 
а 
giving 
dL o | (a) | 
da nlo | 
The minimum occurs when g(a) = (b), that is, when a = b ога = —b. Since 


а — b does not satisfy 
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b 
f g(t)dt=1—-a, 
a 


we choose a = —b. The shortest confidence interval based on T, is therefore the 
equal-tails interval, 


== o — o => 
(x tnam Xn) or (x- zal X ena. . 


The length of this interval is 2z¢/2(0/./n). In this case we can plan our experiment 
to give a prescribed confidence level and a prescribed length for the interval. To have 
level 1 — a and length < 2d, we choose the smallest и such that 


с в? 
d Z Ze/2—> 0t 


2 
п п 2 24/275: 


This сар also be interpreted as follows. If we estimate u by X, taking a sample of 
sizen > z 2/20? /d?), we are 100(1 — o) percent confident that the error in our 
estimate is at most d. 


Example 2. In Example 1, suppose that o is unknown. In that case we use 


T, = = Jn 





as a pivot. T,, has Student's t-distribution with n — 1 d.f. Thus 





sad ern <u x] 


X 
l-a=P t « 
We wish to minimize 
S 
= (b — a)— 
( ) Un 
subject to 


b 
f абаа 


where f; (f) is the PDF of T,,. We have 


dL db S db 
dom -(2- ) л апа fn) — fn-1(a) =0 
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giving 
dL ДЕ = - ] S 
da fai) Jn 
It follows that the minimum occurs at а = —b (the other solution, a = b, is not 


admissible). The shortest-length confidence interval based on T, is the equal-tails 
interval, 


z $ — S 
(x ix dtum 7, X+ cian) . 

The length of this interval is 2t; _1,0/2(5/ Мп), which, being random, may be arbi- 
trarily large. Note that the same confidence interval minimizes the expected length 
of the interval, namely, EL = (b — a)c,(o/ vn), where c, is a constant determined 
from ES = спо and the minimum expected length is 21,1, «2c (о / /n). 

Example 3. Let X1, X2, ... , X, be iid N (qu, o?) RVs. Suppose that jz is known 
and we want a confidence interval for o?. The obvious choice for a pivot T,,2 is given 


by 


1X; — ш)? 


Т2(%) = 32 


which has a chi-square distribution with n d.f. Now 
n X; — 2 
pfa ow a =1-e, 
o 
so that 
— @. 


» | zio ш cote Liki 9] E 
a 


We wish to minimize 
1 1\< 
L=(--- Ух; – 2 
(> >) 2 i~ B) 
subject to 


b 
f fa(t) dt =1 ~Q, 


where f, is the PDF of a chi-square RV with n d.f. We have 
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dL ] db 2 
wa = (a> imas) ba-w 
and 


db _ fala) 


da fab)’ 


so that 





dL 1 = 1 fala) Z 09, 
da -|3 b2 ma n). 


which vanishes if 


1 1 fn(a) 


а? bb fb) 





Numerical results giving values of a and b to four significant places of decimals are 
available (see Tate and Klett [111]). In practice, the simpler equal-tails interval, 


(ne -a У(Х; =”) 
2 , 2 , 
Хп «/2 Хп,1—«/2 


may be used. 
If is unknown, we use 


У(Х; — X)? _ 52 
arii i 
o 


T,2(X) = (n—1)5 


as a pivot. T,2 has a x ? (n — 1) distribution. Proceeding as above, we can show that the 
shortest-length confidence interval based on 7,2 is ((n — 1)(S*/b), (n — 1)(52 /а)); 
here a and b are a solution of 


Pla < x!(n—1) «b) 21—a 
and 
а? f, (a) = b? fa (b), 


where fj. is the PDF of a x?(n — 1) RV. Numerical solutions due to Tate and 
Klett [111] may be used, but in practice, the simpler equal-tails confidence interval, 


(evs a) 
Х7—1,а/2 Х7-1,1-а/у2 


is employed. 
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Example 4. Let Х\, X2,..., X, be a sample from U (0, Ө). Then X(n) is suffi- 
cient for 6 with density 


n—1 


fa) = nt y О<у<60. 
The RV Tọ = X(4/0 has PDF 
h(t) nt", O<t<1. 


Using Тө as pivot, we see that the confidence interval is (X 4) /b, X(n)/a) with length 
L = Х(л)(1/а — 1/b). We minimize L subject to 


b 
f nt" dt = b" — a" = 1 о. 
a 


Now 
(1—o)/ <b<i 


and 


dL x ida | 1 y quc od 
db OV тау p] ^ OV ьа i 


so that the minimum occurs at b = 1. The shortest interval is therefore (Xn), 
Хпу/о1/"). Note that 


1 1 n0 {1 1 
EL = | — – – | EXm = — { – – – |, 
(> >) e) — 5) 


which is minimized subject to 
b" —a" =1—a, 


where b = 1 and a = a!/". The expected length of the interval that minimizes EL 
is [(1 Jat P) — 1][10/ (n + 1)], which is also the expected length of the shortest con- 
fidence interval based on X(n). Note that the length of the interval (Xn), ^ !/" Xm) 
goes to 0 as n — oo. 


For some results on asymptotically shortest-length confidence intervals, we refer 
the reader to Wilks [117, pp. 374—376]. 
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PROBLEMS 11.4 


1. 


Let X1, X2,... , X, be a sample from 


e e-9 if x > Ө, 


0 otherwise. 


во | 


Find the shortest-length confidence interval for Ө at level 1 — с, based on a 
sufficient statistic for 0. 


. Let X1, X2, ... , X, bea sample from G(1, 0). Find the shortest-length confi- 


dence interval for Ө at level 1 — a, based оп a sufficient statistic for 0. 


. In Problem 11.3.9, how will you find the shortest-length confidence interval for 


Ө at level 1 — о based on the statistic X /0? 


. Let T(X,0) be a pivot of the form T(X,0) = Т(Х) — Ө. Show how one can 


construct a confidence interval for Ө with fixed width d and maximum possi- 
ble confidence coefficient. In particular, construct a confidence interval that has 
fixed width d and maximum possible confidence coefficient for the mean џи of 
a normal population with variance 1. Find the smallest size n for which this 
confidence interval has a confidence coefficient > 1 — o. Repeat the above in 
sampling from an exponential PDF 


fax) = её forx> и and fy(x)=0 forx < џи. 


(Desu [20]) 


. Let X1, X2, ... , Xn be a random sample from 


1 = 
no = x e (=H), xeR, 0»0. 
Find the shortest-length (1 — o)-level confidence interval for Ө, based on the 
sufficient statistic У? , |X;]. 


In Example 4, let R = Xn) — Xq). Find a (1 — a)-level confidence interval for 
0 of the form (R, R/c). Compare the expected length of this interval to the one 
computed in Example 4. 


. Let X1, X2,... , X, be a random sample from a Pareto PDF (х) = 0/x?, 


x > 0, and = 0 for x < Ө. Show that the shortest-length confidence interval for 
6 based on Xq) is (X (1o! /"^, Xa). (Use 0/ Xq) as a pivot.) 


. Let X1, X2,... , X, be a sample from PDF fo(x) = 1/(@2 – 01), Ө < x < 


62, Ө < Ө; and = 0 otherwise. Let R = Xm) — Xa). Using К/(Ө› — 61) as 
a pivot for estimating 62 — 6, show that the shortest-length confidence interval 
is of the form (R, R/c), where c is determined from the level as a solution of 
с" (п — 1)e п] - a = 0. (Ferentinos [24]) 
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11.5 UNBIASED AND EQUIVARIANT CONFIDENCE INTERVALS 


In Section 11.3 we studied test inversion as one of the methods of constructing con- 
fidence intervals. We showed that UMP tests lead to UMA confidence intervals. In 
Chapter 9 we saw that UMP tests generally do not exist. In such situations we either 
restrict consideration to smaller subclasses of tests by requiring that the test functions 
have some desirable properties, or we restrict the class of alternatives to those near 
the null parameter values. In this section we follow a similar approach in constructing 
confidence intervals. 


Definition 1. A family {S(x)} of confidence sets for a parameter Ө is said to be 
unbiased at confidence level 1 — o if 


(1) P9(S(X) contains 09) > 1 — o 
and 
Q) Po{S(X) contains 0°} < 1 — о foralló, 9'€6, 0 #0. 


If S(X) is an interval satisfying (1) and (2), we call it a (1 — a)-level unbiased con- 
fidence interval. If a family of unbiased confidence sets at level 1 — a is UMA in 
the class of all (1 — o)-level unbiased confidence sets, we call it a UMA unbiased 
(UMAU) family of confidence sets at level 1 — о. In other words, if $*(x) satisfies (1) 
and (2) and minimizes 


P5 (S(X) contains 0°} for0,0'€0, 0590 


among all unbiased families of confidence sets S(X) at level 1 — o, then S*(X) isa 
UMAU family of confidence sets at level 1 — o. 


Remark I. Definition 1 says that a family S(X) of confidence sets for a parame- 
ter 0 is unbiased at level 1 — o if the probability of true coverage is at least 1 — о and 
that of false coverage is at most 1 — a. In other words, S(X) traps a true parameter 
value more often than it does a false one. 


Theorem 1. Let А (00) be the acceptance region of a UMP unbiased size o test 
of Но(00) : Ө = 09 against Н! (00): Ө 5 Oo for each 6. Then S(x) = (0: x є A(0)) 
is a UMA unbiased family of confidence sets at level 1 — a. 


Proof. To see that S(x) is unbiased, we note that since A(0) is the acceptance 
region of an unbiased test, 


P5 (S(X) contains 6’} = Pa {X є A(6’)} < 1 — a. 


We next show that S(X) is UMA. Let S*(x) be any other unbiased (1 — o)-level 
family of confidence sets, and write A*(0) = (x: S*(x) contains 0). Then P5(X є 
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A*(0^)) = Po{S*(X) contains 0°} < 1—o, and it follows that A*(@) is the acceptance 
region of an unbiased size a test. Hence 


Po {S*(X) contains 0'} = Р(Х є A*(6’)} 
> РХ є А(0')} 
= Р,{5(Х) contains 0'}. 


The inequality follows since A(@) is the acceptance region of a UMP unbiased test. 
This completes the proof. 


Example І. Let X1, X2,..., Xn be a sample from N (pu, о?) where both и and 
c? are unknown. For testing Ho: u = шу against Hi: и Æ po, it is known (Fergu- 
son [25, р. 232]) that the t-test 


I/nG — шо) 
1 ———— >с 
ф(х) = $ 
0, otherwise, 


, 


where X = Ð xj/n and s? = (n — 1)~! Y: (x; — X)? is UMP unbiased. We choose с 
from the size requirement 








| Jn (X — uo) | 
a = Ра }|———. | > €f: 
$ 
so that c = tn—1,a/2. Thus 
n (x — цо) 
A(uo) = {x: ею x nsn] 








is the acceptance region of a UMP unbiased size o test of Ho: и = шу against 
Hi: u + no. By Theorem 1 it follows that 


S(x) = {u: x e А(и)) 
exo EE. 
= {x = лее z5uzx-c ean] 
is а UMA unbiased family of confidence sets at level 1 — o. 


If the measure of precision of a confidence interval is its expected length, one is 
naturally led to a consideration of unbiased confidence intervals. Pratt [79] has shown 
that the expected length of a confidence interval is the average of false coverage 
probabilities. 


Theorem 2. Let Ө be an interval on the real line and fg be the PDF of X. Let 
S(X) be a family of (1 — o)-level confidence intervals of finite length; that is, let 
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S(X) = (0(X), Ө(Х)), and suppose that 6(X) — Ө(Х) is (random) finite. Then 
(3) f (0(x) — Ө(х)) fo (x) dx = f Pa (S(X) contains 6’} dé’ 
0'40 
for all 0 € ©. 


Proof. We have 


Thus for all 0 € Ө, 


= 8 
Eo EŒ) — 800) = Eg | Í | 
8 


2j i | ] iw) ás 
fi nwa] do’ 


= f Рә{5$(Х) contains 6’} d0’ 
= Í P {S(X) contains 6’} dé’. 
03:6 


Remark 2. If S(X) is a family of UMAU (1 — o)-level confidence intervals, 
the expected length of S(X) is minimal. This follows since the left-hand side of (3) 
is the expected length, if Ө is the true value, of S(X) and Po(S(X) contains 6'] is 
minimal [because S(X) is UMAU], by Theorem 1, with respect to all families of 
1 — о unbiased confidence intervals uniformly in 0(0 x Ө”). 


Since a reasonably complete discussion of UMP unbiased tests (see Section 9.5) 
is beyond the scope of this book, the following procedure for determining unbi- 
ased confidence intervals is sometimes quite useful (see Guenther [35]). Let X1, X2, 
... , Xn be a sample from an absolutely continuous DF with PDF fọ(x), and sup- 
pose that we seek an unbiased confidence interval for Ө. Following the discussion in 
Section 11.4, suppose that 


T(Xi, X2, ..., X4,0) = T(X, 0) = To 
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is a pivot, and suppose that the statement 

Р{А (0) < Tg < А№(а)} 21—« 
can be converted to 

P9(0(X) «8 < O(X)} = 1 — a. 


For (0, 8) to be unbiased, we must have 


(4) Р(0,0') = Po{O(X) «0 < O(X)}=1-a_ if0'—0 
and 
(5) P(0,9)-1—ao if0' £8. 


If P (0, 6’) depends only on a function y of Ө, 6’, we may write 


ziw Aes 
6 P 
©) Peisa мер: 


and it follows that Р (y) has a maximum at 6’ = Ө. 


Example 2. Let X, X2, ... , X, be iid A (p, о?) RVs, and suppose that we de- 
sire an unbiased confidence interval for с2. Then 


_(п—1)$?_ 


Т(Х, о?) 3i To 


has a x?(n — 1) distribution, and we have 


52 
Р № < mn- 0-5 < мю 1-а, 
o 


so that 
52 52 
Plane ELE =1—@. 
Тһеп 
52 s? 
P(o?, 0°) = P, fa -> <o? 26+ 74 
52 А 


To 


To 
= Р{— < у < ~}, 
А А 
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o.2,,2 2 
where y = o'^/ao^ and Т; ~ x^(n — 1). Thus 


P(y) = PDay < Т < Лоу}. 


Р(1)=1—«= and P(y) «1—a. 
Thus we need А, А2 such that 
(7) P(1)21-« 


and 


dP(y) 


(8) dy 





LR А fn—1(A2) — А fa-1 1) = 0, 


where f,..; is the PDF of Т. Equations (7) and (8) have been solved numerically for 
Ay, À2 by several authors (see, for example, Tate and Klett [111]). Having obtained 
À1, А from (7) and (8), we have as the unbiased (1 — a)-level confidence interval 


52. 52 
(9) (« — DT (n — Dž) У 


Note that in this case the shortest-length confidence interval (based оп 775) derived 
in Example 11.4.3, the usual equal-tails confidence interval, and (9) are all different. 
The length of the confidence interval (9), however, can be considerably greater than 
that of the shortest interval of Example 11.4.3. For large n all three sets of intervals 
are approximately the same. 


Finally, let us briefly investigate how invariance considerations apply to confi- 
dence estimation. Let X = (X1, X2,...,Xn) ~ fo,0 € Ө C R. Let G be a group 
of transformations on X that leaves P = (fo: 0 € ©} invariant. Let S(X) Бе a 
(1 — a)-level confidence set for Ө. 


Definition 2. Let Р be invariant under C, and let S(x) be a confidence set for Ө. 
Then $ is equivariant under G if for every x € Х,0 € Ө, and g € G, 


(10) S(x) € 0 <> S(g(x)) э 26. 
Example 3. Let X1, X2, ... , Xn be a sample from PDF 
fe(x) = expl - (à — 0)], х> 6 


and = Oifx < 0. Let G = (а, 1: a € R}, where (а, i}x = (х + a, x2 + 
a,...,Xn + а) and © induces © = С on © = R. The family { fo) remains invariant 
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under G. Consider a confidence interval of the form 
S(x) = {0:х-с <0 <x +c} 
where c1, c» are constants. Then 
S({a, 1x) = {0:х+а-с x0xX-cra-—cj) 
Clearly, 


S(x) >80 —>х-+а—су<060+а<х+а—с» 


«= S({a, 1}x) > 20 
and it follows that S(x) is an equivariant confidence interval. 


The most useful method of constructing invariant confidence intervals is test in- 
version. Inverting the acceptance region of invariant tests often leads to equivariant 
confidence intervals under certain conditions. Recall that a group G of transforma- 
tions leaves a hypothesis-testing problem invariant if С leaves both G9 and ©, in- 
variant. For each Но : 0 = 09, 09 € © we have а different group of transformations, 
Go, which leaves the problem of testing Ө = 69 invariant. The equivariant confidence 
interval, on the other hand, must be equivariant with respect to G, which is a much 
larger group since G D Ga, for all 0o. The relationship between an equivariant confi- 
dence set and invariant tests is more complicated when the family P has a nuisance 
parameter т. 

Under certain conditions there is a relationship between equivariant confidence 
sets and associated invariant tests. Rather than pursue this relationship, we refer the 
reader to Ferguson (27, p. 262]; it is generally easy to check that (10) holds for 
a given confidence interval S to show that S is invariant. The following example 
illustrates this point. 


Example 4. Let Xi, X2,... , Xn be iid N (и, о?) RVs where both и. and c? are 
unknown. In Example 9.5.3 we showed that the test 


1 ЖУЙ ааг 
0 otherwise 


ф(х) = | 


is UMP invariant, under translation group for testing Ho : o? > og against H; : 


о? < og. Then the acceptance region of $ is 


n 
А(х) = | : 3 -3Y > оўхл-11-а 
1 
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Clearly, 
2 
2 (п—1)5 
x € A(x) <> оу < ara 
п—1,1—«= 
and it follows that 
— 1)s2 
S(x) = [^ ;о? < ml 
Xn —1,1—o 


is a (1 — a)-level confidence interval (upper confidence bound) for c?. We show that 
$ is invariant with respect to the scale group. In fact, 


203 — ps2 
S((0, cj) = [^ io? < mw] 
Xn—1,1—o 


and 


qx 
o? < COU e S(O, јх) э go? = (0, cjo? 


Xn—1,1—a 


and it follows that S(x) is an equivariant confidence interval for o°. 


PROBLEMS 11.5 


1. Let X1, X2,... , X, be a sample from U (0, 0). Show that the unbiased confi- 
dence intervals for Ө based on the pivot max X;/@, coincides with the shortest- 
length confidence interval based on the same pivot. 


2. Let X1, X2,... , Xn be a sample from G(1, Ө). Find the unbiased confidence 
interval for Ө based on the pivot 2 ? 7. Х; /0. 


3. Let X1, X2, ... , X, be a sample from the PDF 


-(x-8) 
fo(x) = : 


ifx 890 
otherwise. 


Find the unbiased confidence interval based on the pivot 2n[min X; — 6]. 


4. Let X1, X2, ... , X, be iid V (yu, с?) RVs where both и and c? are unknown. 
Using the pivot Tyo = ./n(X — «)/S, show that the shortest-length unbi- 
ased (1 — o)-level confidence interval for yz is the equal-tails interval (X — 
tn—1,a/25//n, X+ tn—1,0/25//N). 
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5. Let X1, X2, ... , X, be iid with PDF fa(x) = 0/x?, x > Ө, and = 0 otherwise. 
Find the shortest length (1 — o)-level unbiased confidence interval for Ө based 
on the pivot 0/ X (1). 


6. Let X1, X2, ... , Xn be a random sample from a location family P = {fọ (x) = 
f(x – 0); 0 є R}. Show that a confidence interval of the form S(x) = (0 : 
T(x) — сү € 0 x T(x) + c2}, where T(x) is an equivariant estimate under 
location group is an equivariant confidence interval. 


7. Let X1, X2,... , X, be iid RVs with common scale PDF fo (х) = (1/o) f (x/o), 
о > 0. Consider the scale group G = {{0, b} : b > 0}. If T (x) is an equivariant 
estimate of a, show that a confidence interval of the form 


T (x) 
sœ = [ога < se] 


is equivariant. 


8. Let X1, X2, ... , X, be iid RVs with PDF %(х) = exp[—(x — 0)], x > 0 and 
= 0, otherwise. For testing Ho : Ө = 09 against Hı : 0 > 60, consider the 
(UMP) test 


1 f Ld Ing 
ШОХ, =т=, 
(x) = c n 


0, otherwise. 


Is the acceptance region of this a-level test an equivariant (1 — o)-level confi- 
dence interval (lower bound) for 0 with respect to the location group? 


СНАРТЕК 12 


General Linear Hypothesis 


12.1 INTRODUCTION 


This chapter deals with the general linear hypothesis. In a wide variety of problems 
the experimenter is interested in making inferences about a vector parameter. For 
example, he may wish to estimate the mean of a multivariate normal or to test some 
hypotheses concerning the mean vector. The problem of estimation can be solved, for 
example, by resorting to the method of maximum likelihood estimation, discussed 
in Section 8.7. In this chapter we restrict ourselves to linear model problems and 
concern ourselves mainly with problems of hypothesis testing. 

In Section 12.2 we formally describe the general model and derive a test in com- 
plete generality. In the next four sections we demonstrate the power of this test by 
solving four important testing problems. We need a considerable amount of linear 
algebra in Section 12.2. 


12.2 GENERAL LINEAR HYPOTHESIS 


A wide variety of problems of hypothesis testing can be treated under a general 
setup. In this section we state the general problem, and derive the test statistic and its 
distribution. Consider the following examples. 


Example І. Let Yi, Y2,... , Y, be independent RVs with EY; = ш, і = 
1,2, ... , k, and common variance o?. Also, n; observations are taken on Y;,i = 
1,2,...,k, and Xa ni = n. It is required to test Но: ш = u2 = --- = pg. The 
case k — 2 has already been treated in Section 10.4. Problems of this nature arise 
quite naturally, for example, in agricultural experiments where one is interested in 
comparing the average yield when К fertilizers are available. 


Example 2. An experimenter observes the velocity of a particle moving along a 
line. He takes observations at given times г, f2,... , tn. Let Вт be the initial velocity 
of the particle and f? be the acceleration; then the velocity at time t is given by y = 
Bi + Bot +, where e is an RV that is nonobservable (e.g., an error in measurement). 
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In practice, the experimenter does not know f; and f» and has to use the random 
observations Y;, Y2, ... , Y, made at times tj, t2,... , tn, respectively, to obtain some 
information about the unknown parameters f, Bo. 

A similar example is the case when the relation between y and t is governed by 


y = Bo + Bit + Bar? + e, 


where t is a mathematical variable, Во, В, 82 are unknown parameters, and ғ is а 
nonobservable RV. The experimenter takes observations Y;, Y2,... , У, at predeter- 
mined values t1, £2, .. . , fn, respectively, and is interested in testing the hypothesis 
that the relation is in fact linear, that is, 82 = 0. 


Examples of the type discussed above and their much more complicated variants 
can all be treated under a general setup. To fix ideas, let us first make the following 
definition. 


Definition 1. Let Y = (Y1, Y, ... , У.) be a random column vector and X be an 
n x k matrix, k < n, of known constants х;у, = 1,2,...,n; j = 1,2,...,k. We 
say that the distribution of Y satisfies a linear model if 


(0) ЕҮ – ХВ, 


where B = (f, B2, ... , By)’ is a vector of unknown (scalar) parameters 61, Bo, 
..- , Ёк. It is convenient to write 


(2) Y-Xf +e, 
where € = (£1,£2,...,€4)' is a vector of nonobservable RVs with Ee; = 0, 
j = 1,2,...,n. Relation (2) is known as a linear model. Then the general linear 


hypothesis concerns В, namely, that f satisfies Ho: HB = 0, where Н is a known 
r x k matrix with r € k. 


In what follows we assume that £1, £2, ... , €n are independent, normal RVs with 
common variance o? and Ee; = 0, j = 1, 2,... ‚п. In view of (2), it follows that 
Yi, Y2, ... , Y, are independent normal RVs with 


k 
Q) EY, = M xjB; and var(¥;)=o7, i=1,2,...,n. 
j=l 


We assume that H is a matrix of full rank r,r < k, and X is a matrix of full rank 
k < n. Some remarks are in order. 


Remark 1. Clearly, Y satisfies a linear model if the vector of means EY = 
(БҮ, EY2,... , EY, lies in a k-dimensional subspace generated by the linearly 
independent column vectors xj, X2, ... , Xx of the matrix X. Indeed, (1) states that 
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EY is a linear combination of the known vectors xj, ... , Xy. The general linear 
hypothesis Но: HB = 0 states that the parameters f, £5, ... , fy satisfy ғ indepen- 
dent homogeneous linear restrictions. It follows that under Ho, EY lies in a (k — r)- 
dimensional subspace of the k-space generated by x1, ... , Xy. 


Remark 2. Тһе assumption of normality, which is conventional, is made to com- 
pute the likelihood ratio test statistic of Ho and its distribution. If the problem is to 
estimate В, no such assumption is needed. One can use the principle of least squares 
and estimate f8 by minimizing the sum of squares, 


(4) yos = ee’ = (Y – XB)'(¥ - ХВ). 
i=l 


The minimizing value BY) is known as a least squares estimate of B. This is not a 
difficult problem and we do not discuss it here in any detail but mention only that 
any solution of the normal equations 


(5) X’XB —X'Y 


is a least squares estimator. If the rank of X is k(< n), then X'X, which has the same 
rank as X, is a nonsingular matrix that can be inverted to give a unique least squares 
estimator 


(6) В = (Х'Х) HX. 


If the rank of X is < k, then X'X is singular and the normal equations do not have 
a unique solution. One can show, for example, that B is unbiased for B, and if the 
Y;'s are uncorrelated with common variance o?, the variance-covariance matrix of 
the f's is given by 


(7) Е (в - в) (B- 8 = XX). 


Remark 3. Опе can similarly compute the restricted least squares estimator of 
B by the usual method of Lagrange multipliers. For example, under Ho: НВ = 0, 
one simply minimizes (Y — XBY (Y — XB) subject to HB = 0 to get the restricted 
least squares estimator B. The important point is that if ғ is assumed to be a multi- 
variate normal RV with mean vector 0 and dispersion matrix o?L,, the MLE of В is 
the same as the least squares estimator. In fact, one can show that Ё; is the UMVUE 
of Bj, i = 1,2,... , К, by the usual methods. 


Example 3. Suppose that a random variable Y is linearly related to a mathemat- 
ical variable x that is not random (see Example 2). Let Y1, Y2, ... , Y, be obser- 
vations made at different known values х], x2, ... , x, of x. For example, х], x2, 
... , Xn may represent different levels of fertilizer, and Y,, Y2, .. . , Yn, respectively, 
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the corresponding yields of a crop. Also, £1, €2,... , En represent unobservable RVs 
that may be errors of measurements. Then 


Y; = Bo + Вх; + є, i=1,2,...,n, 


and we wish to test whether Ву = 0, that the fertilizer levels do not affect the yield. 
Here 


В = (Во, Bi)’, ала € = (81, €2,... , EnV. 


The hypothesis to be tested is Ho: Ву = 0, so that with Н = (0, 1), the null hypoth- 
esis can be written as Ho: HB = 0. This is a problem of linear regression. 
Similarly, we may assume that the regression of Y on x is quadratic: 


Y = Bo + Bix Bax? +e, 


and we may wish to test that a linear function will be sufficient to describe the rela- 
tionship, that is, Вә = 0. Here X is the n x 3 matrix 


xi x 
х X5 

X = F E , 
l Xp x2 


В = (Bo, Bi, B2), and & = (е1, 62, ... , En)’, 


and Н is the 1 x 3 matrix (0, 0, 1). 
In another example of regression, the Y’s can be written as 


Y = fixi + Box2 + Baxa + є, 
and we wish to test the hypothesis that Ві = 62 = Вз. In this case, X is the matrix 


xu xg %43 
x x2 x5 
X-|. , . 


kd 
Xni Xn2 Xn3 


and H may be chosen to be the 2 x 3 matrix 


1 0 —1 
"=(; _1 o) 
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Example 4. Another important example of the general linear hypothesis involves 
the analysis of variance. We have already derived tests of hypotheses regarding the 
equality of the means of two normal populations when the variances are equal. In 
practice, one is frequently interested in the equality of several mous when the vari- 
ances are the same, that is, one has k samples from N (u1, с Э), o N (tg, о2), 
where o? is unknown and one wants to test Ho: и = [12 = +-+- = (t (see Ex- 
ample 1). Such a situation is of common occurrence in ssiicultital experiments. 
Suppose that k treatments are applied to experimental units (plots), the ith treatment 
is applied to n; randomly chosen units, i = 1,2, ... ‚К, X ут = n, and the obser- 
vation yjj represents some numerical characteristic (yield) of the jth experimental 
unit under the ith treatment. Suppose also that 


Yij = wi t ё, PHM Zret d—1,2. Kk, 


where 2j; are iid Л/(О, c?) RVs. We are interested in testing Ho: ш = uz =+: = 
ик. We write 


Y = (үп, Yj2,... s Vin, Үз, Yo2,... > Yon: Vans kystes r Yen), 
В = (ш, но, ... шк)", 


and 
in 0 -.. 0 
0 L, 0 
X= : 3 
0 0. d 
where 14, = (1, 1,... , 1)’ is the n;-vector (i = 1, 2,..., К), each of whose ele- 


ments is unity. Thus X is n x k. We can choose 


Exp. 0 о ud 
1 0 -1 ee 0 
H = 
1 0 0 e -] 
so that Ho: ш = u2 = ·· · = ид is of the form HB = 0. Неге Н isa (k — 1) xk 


matrix. 

The model described in this example is frequently referred to as a one-way anal- 
ysis of variance model. 'This is a very simple example of an analysis of variance 
model. Note that the matrix X is of a very special type; namely, the elements of X 
are either 0 or 1. X is known as a design matrix. 


Returning to our general model 


Y—-XB- =, 
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we wish to test the null hypothesis Ho: HB = 0. We will compute the generalized 
likelihood ratio test and the distribution of the test statistic. To do so, we assume that 
є has a multivariate normal distribution with mean vector 0 and variance—covariance 
matrix o?L, where 02 is unknown and Í, is the n x n identity matrix. This means 
that Y has an n-variate normal distribution with mean X fj and dispersion matrix o ?L, 
for some В and some c?, both unknown. Here the parameter space Ө is the set of 
(k+ D-tuples (B', o?) = (Bi, B2, ... , Вк, 77), and the joint PDF of the X's is given 
by 


1 text à 
fg,o201. 72, +++ + Yn) = Cagri P -z> 2 — Віхи — +++ — PkXik) | 


! 1 Я 
(8) = блуп ехр [-5=% — ХВ) (X — xe І 


Theorem 1. Consider the linear model 
Y=Xß +e, 


where X is an n x k matrix, (xij), i = 1,2,....n, j = 1,2,...,k, of known 
constants and full rank k < n, B is a vector of unknown parameters В;, @2,... , Bx, 
and € = (€1, £2, ... , En) is a vector of nonobservable independent normal RVs with 
common variance o? and mean Ee = 0. The GLR test for testing the linear hypoth- 
esis Ho: HB = 0, where H is an r x k matrix of full rank r < k, is to reject Ho at 
level е if F > Fy, where Рн, {F > F4] = о and F is the RV given by 


Y- Xpy’ (Y — Xp) -(Ү- XB) — XB) 


(9) F- 
Y - xy – XB) 


In (9), B. and B are the MLEs of В under © and Өс, respectively. Moreover, the RV 
[(n — &)/r]F has the F-distribution with (r, n — k) d.f. under Ho. 


Proof. The GLR test of Но: HB = 0 is to reject Ho if and only if A(y) < с, 
where 


SUpoce, Sp,o2(Y) 


10 Му) = Я 
e) m Supgeo fg, o2 (У) 


Ө = (В', c?)', and Өр = ((B', c?y : HB = 0}. Let 0 = (B’, 62) be the MLE of 


0' € Ө, and 0 = (В. б ^y be the MLE of @ under Но, that is, when Hf = 0. It is 
easily seen that Bi is the value of B that minimizes (y — XB)'(y — Xp), and 


(11) ê? = n~! (y — XB) (y — XB). 


Similarly, B is the value of В that minimizes (y — XB)'(y — ХВ) subject to HB = 0, 
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and 
22 E a P 
(12) ê —n (y-Xf)y( — XB). 
Jt follows that 
2 

gi 
(13) AY) = {<> Я 

б 


The critical region А (у) < с is equivalent to the region ay n < {с} 21", which 
is of the form 


N 


›› 


(14) 


i a 
N 


This may be written as 


хуу х), 


(15) _ А 
(у – XBY Y – ХВ) 


ог, equivalently, as 





(y Хубу - XB) – (у – XBY(y— XB) | 
(у — XBY (y — XB) 
It remains to determine the distribution of the test statistic. For this purpose it 


is convenient to reduce the problem to the canonical form. Let У„ be the vector 
space of the observation vector Y, V; be the subspace of V, generated by the col- 


(16) — 1. 


umn vectors X1, X2, ... , Xy of X, and Vg- be the subspace of V, in which EY 
is postulated to lie under Hp. We change variables from Yi, Yo, ... , Y, to Z1, 22, 
...,2Zn, Where Zi, Z2, ... , Zn are independent normal RVs with common variance 
c? and means EZ; = 6j, = 1,2,...,k, EZ; = 0, i = К+ 1,... ,n. This 


is done as follows. Let us choose an orthonormal basis of k — r column vectors 
{aj} for Vk-r, say (а; +, 042, ... , ар}. We extend this to an orthonormal basis 


{Qe}, @2,... , 0, 041, ... , €) for Vg, and then extend once again to an orthonor- 
mal basis (à, @2,... , Өк, @+1,--. , &n} for Vn. This is always possible. 
Let z1, 22, ... , Zn be the coordinates of y relative to the basis (aj, 02, ... , аһ]. 


Then 2; = еу апа 2 = PY, where P is an orthogonal matrix with ith row а. Thus 
EZ; = Eo; Y = «ХВ, and EZ = PXB. Since ХВ є V, (Remark 1), it follows 
that a; XB = 0 for i > k. Similarly, under Но, XB € Vk-r C Vy, so that œ; XB = 0 


fori < r. Let us write œ = PXB. Then ору = «42 = ++: = @n = 0, and 
under Ho, юу = @2 = --- = œ, = 0. Finally, from Corollary 2 of Theorem 5.4.6 
it follows that Z1, Z2, ... , Zn are independent normal RVs with the same variance 


о? and EZ; = œi, i = 1,2,... ‚п. We have thus transformed the problem to the 
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following simpler canonical form: 


Q: Zi are independent N(w;,07), i=1,2,...,n, 
(17) Ok] = Ok42 = 5 = Oy, = 0, 

Ho: w = а) =: = о, = 0. 
Now 
(18) (y - XBY (y — XB) = (P'z — P'o) (P'z — Р'о) 


= (z-— wy (Z — ө) 
k n 
= yc — o + У 27. 
i=l i=k+1 


The quantity (y — XB)'(y — ХВ) is minimized if we choose ô; = zi, i = 
1,2, ... , k, so that 


n 


(19) (y - XB - X = Y^ z. 
ixk+1 
Under Но, ој = ә = --: = a, = 0, so that (y — ХВ) (у — XB) will be 

minimized if we choose ô; = zi, i =r +1,... ,k. Thus 

д 2 r п 
(0) (y — XB)'(y ХВ) = У `2 У 22. 
It follows that 

s p» 7? ; 


Now S Eri 22/0? has a x2(n — k) distribution, and under Ho, ES 72/а? Һаѕ 
а х?(т) distribution. Since УУ Z2 and 377 у; Z2 are independent, we see that 
[(n — &)/r]F is distributed as F(r, n — К) under Ho, as asserted. This completes the 
proof of the theorem. 


Remark 4. In practice, one does not need to find a transformation that reduces 
the problem to the canonical form. As will be done in the following sections, one 


simply computes the estimators 6 and Ô and then computes the test statistic in any 
of the equivalent forms (14), (15), or (16) to apply the F-test. 


Remark 5. Тһе computation of B. B is greatly facilitated, in view of Remark 3, 
by using the principle of least squares. Indeed, this was done in the proof of Theo- 
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rem 1 when we reduced the problem of maximum likelihood estimation to that of 
minimization of sum of squares (y — XB)'(y — Xp). 


Remark 6. Тһе distribution of the test statistic under H is easily determined. We 
note that Z;/o ~ N (о/о, 1) fori = 1,2,... ,r,sothat 9 1 ., Z2/o? has a noncen- 
tral chi-square distribution with r d.f. and noncentrality parameter ô = У; e? /o?. 
It follows that [(n — k)/r]F has a noncentral F-distribution with d.f. (r, n — k) 
and noncentrality parameter д. Under Ho, ô = 0, so that [(n — k)/r]F has a cen- 
tral F(r, n — k) distribution. Since 7.02 = Уу 1(Е2;)2, it follows from (19) 
and (20) that if we replace each observation Y; by its expected value in the numerator 
of (16), we get 05. 


Remark 7. The general linear hypothesis makes use of the assumption of com- 


mon variance. For instance, in Example 4, Ү;; ~ N (ui, a2), уж AL 2 cok. 
Let us suppose that Ү;; ~ N (ui, оў), i = 1, 2,..., Е. Then we need to test that 
Ор = оз = --- = ox before we can apply Theorem 1. The case k = 2 has already 


been considered in Section 10.3. For the case where k > 2 one can show that a UMP 
unbiased test does not exist. A large-sample approximation is described by Lehmann 
[62, рр. 376—377]. It is beyond the scope of this book to consider the effects of depar- 
tures from the underlying assumptions. We refer the reader to Scheffé [99, Chap. 10], 
for a discussion of this topic. 


PROBLEMS 12.2 


1. Show that any solution of the normal equations (5) minimizes the sum of squares 
(Y - XB)'(Y - Xp). 


2. Show that the least squares estimator given in (6) is an unbiased estimator of В. 
If the RVs Y; are uncorrelated with common variance o?, show that the covari- 
ance matrix of the 8; ° is given by (7). 


3. Under the assumption that є [in model (2)] has a multivariate normal distribution 
with mean 0 and dispersion matrix o?1,, show that the least squares estimators 
and the MLEs of B coincide. 


4. Prove statements (11) and (12). 


5. Determine the expression for the least squares estimator of B subject to НВ = 0. 


12.3 REGRESSION MODEL 


In this section we consider a simple linear regression model as a special case of 
the general linear hypothesis and show how some inferential questions about the 
parameters of the regression equation can be answered. Let x1, x2, ... , x, ben given 
numbers, and suppose that 
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(1) Yi = Во + Bixi + ёг, i=1,2,...,n, 


where Во, Вт are unknown parameters and є; are independent normal RVs with 
Ee; = 0 and var(e;) = o?, i = 1,2,... ‚п. Also, c? is assumed to be unknown. 
Our object is to test hypotheses concerning Во and f, and to construct confidence 
intervals for Во and Ву. Rewriting (1) in the usual fashion, we have 


(2) Y — Xp +e, 
where 
XI 
ГА x2 
В = (Во. В). and X= . 
1 Xn 


Clearly, Y;, ¥2,..., Y, are independent normal RVs with EY; = Во + Вх; and 
var(Y;) = о?,ї = 1,2,... , n, and Y is an n-variate normal random vector with 
mean XB and variance o7I,,. The joint PDF of Y is given by 


1 1 Ты 
(3) f(y; Bo. Bi, 07) = Олу gn exp Ee уд — fo — v? ; 


It easily follows that the MLEs for Во, £1, and c? are given by 


> dea oe 
(4) Bo = мып — fix, 
a X20 — ¥)(¥i — Y) 
and 
A 2 se =) 
(6) 8? = – V (Yi — Bo — Віхи), 
ni 


where х = n^! У ух. 
If we wish to test Ho: В = 0, we take Н = (0, 1), so that the model is a special 
case of the general linear hypothesis with k = 2, r = 1. Under Ho the MLEs are 


Diet Yi 


(7) Bo-Y- 
n 





and 
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42 lë SS 
(8) o ==) 0-7) ; 
i=l 


Thus 
Mia - Y? - Xiao: Y + ÂF - йіх)? 
п — Y + Bix — бүл)? 
Ё УЛ уб — X)” 
п aQu — Y + fix — Bia? 


(9) F 


From Theorem 12.2.1, the statistic [(n — 2)/1]F has a central F(1, n — 2) distri- 
bution under Ho. Since F(1, n — 2) is the square of a t(n — 2), the likelihood ratio 
test rejects Ho if 


1/2 
А (п — 2) Y iz Qi — XY 
(10) TA M LR POT Roto 2 > со, 
uii — Y + Bix — pixi) 
where co is determined from t-tables for n — 2 d.f. 


For testing Ho: Bo = 0, we choose H = (1, 0) so that the model is again a special 
case of the general linear hypothesis. In this case 


eee Snes 
mc UR 
iz1* 
and 
22 1 < А 2 
a1) 8 = = Уи -BxY. 
ix] 


It follows that 


ж See - Bim? — УУ (У-У + Bix - A? 


(12) 1 ы -Y | 
Eia (Yi — Y + Bix — pixi)? 
and since 
(13) ĝi = eh _ Cha DU; - Y) + nxY 
У х? У; x? 
= he 
i=l Xi 
A nBox 
= В. + 
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we can write the numerator of F as 


04 POY- Bix? Уи - Y + Ax - Bix)? 
i=l i=l 





А 2 
n X ^ Ln oz oa. nfoXxi 
ЕЕЕ Е Bo 3 
i=l і=1 Х 
п = ^ ^ 
- У-У + Bix - fiy 
i=l 

“ls s nBoXx; ; T A А = 
= (т -Ах- 2) +2ў \(%— Bixi + Йх—Ў) 

1 i=l 


i-l^i 


3 (r — Bix — umm) 


= 


i=1 {i 


_ Ban iG - x» 
RENE I LS 
$53] Xi 


It follows from Theorem 12.2.1 that the statistic 


Boy пуб — HP? / Vi x} 
Yaoi — Y + fix — Bixi)2/(m — 2) 








(15) 


has a central t-distribution with n — 2 d.f. under Ho: Bo = 0. The rejection region is 
therefore given by 





1401 5 uix? 
m Јуан =O Rie o, 


УО — Bo — Bixi)2/(n — 2) 





where со is determined from the tables of t(n — 2) distribution for a given level of 
significance о. 


For testing Ho: Во = Bi = 0, we choose Н = | so that the model is again 


1 
0 1 
a special case of the general linear hypothesis with r = 2. In this case 


n 
Y? 


i-l 


(17) 


Q» 
|! 
sie 


and 
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Ya Y Xo — Y + Bie — Bri)? 
”_1(¥; — ¥ fix — бл)? 

nY + Ё? Ум Gi — x)? 

(Y; — Bo — B1xi)? 
n(Bo + Bix)? + Ê? Ya (a — x» 

Li — Bo — Bix)? 

From Theorem 12.2.1, the statistic [(m —2)/2]F has a central F(2, n —2) distribution 


under Но: Во = f, = 0. It follows that the level-o rejection region for Ho is given 
by 


(18) F- 


n—2 





(19) F > co, 
where F is given by (18) and со is the upper o percent point under the F(2, и — 2) 
distribution. 


Remark 1. It is quite easy to modify the analysis above to obtain tests of null 
hypotheses Во = Во, 81 = Ві, and (Bo, £1)’ = (Во, Ву)”, where Ву, B, are given real 
numbers (Problem 4). 


Remark 2. The confidence intervals for Во, В; are also easily obtained. One can 
show that a (1 — o)-level confidence interval for Во is given by 






Mia Eia — Bo Во)? 
n(n — 2) Y 7 у(х; — x»? 






, 


Q0) (i — tn—2,0/2 





Der at Xi: — Bo В)? 
n(n — 2) У (х; – х)? i 


and that for £; is given by 





(n — 2) i = x)? | 


(21) Êi — tn-2,0/2 





В. + tn-2,a 
a rE 


Similarly, one can obtain confidence sets for (Во, 61)’ from the likelihood ratio test 
of (Bo, 61)’ = (Во, Ву)”. It can be shown that the collection of sets of points (Bo, 81)” 
satisfying 
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us (n — Din (Bo — Bo)? + 2nx (fo — Bo)(Bi — B0 + У x2 — В1)2] 
25100 Ёо Е By xi)? 
< F5n-2,a 


isa a — a)-level collection of confidence sets (ellipsoids) for (бо, 61)’ centered at 


(Bo, By)’. 


Remark 3. Sometimes interest lies in constructing a confidence interval on the 
unknown linear regression function E[Y | хо} = бо + B1xo for a given value of x, or 
on a value of Y given x = xo. We assume that xo is a value of x distinct from x1, x2, 

‚ , Xn. Clearly, Bo + Bixo is the maximum likelihood estimator of Во + Вухо. This 
is also the best linear unbiased estimator. Let us write É(Y | xo] — Bo 4 Bi xo. Then 


É(Y | xo} = F — Bix + fixo 


which is clearly a linear function of normal RVs Y;. It follows that E {Y | xo} is also 
normally distributed with mean E (Bo + B1xo) = Bo + B1xo and variance 


(23) var(E{¥ | хор) = E(Bo — Bo + гхо — fixo? 
= var(flo) + x2 var(B1) + 2x0 соу (До, fi) 


_ ›|1 (X — хо)? 


(see Problem 6). It follows that 


(24) Bo + Bixo — Bo — Віхо 
о{(1/п) + [E — x0)?/ Xi 4 0)2])1/2 

is ЛО, 1). But о is not known, so that we cannot use (24) to construct a confidence 
interval for E(Y | xo). Since n6? a? is a x?(n — 2) RV and n6? /o? is independent 
of Во + B1xo (why?), it follows that 


5 Bo + Bito — Во — Bixo 
25 n -— Foo 
e 8 (1 + n[G — xo/ 3.6; — 3)? 
has a t (t — 2) distribution. Thus a (1 — o)-level confidence interval for Во + B1xo is 
given by 
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n |l,  G-x* 
n-2|n У -—-x?[ 


1 _@~ x0)" 
n Èi Gu — x)? 










(26) Bo + Bixo — tn-2,0/26 





^ ^ a n 
Bo + B1xo + tn-2,a/2% | —> | 
п—2 


In a similar manner, one can show (Problem 7) that 








е 1 БИЕ 
(27) Bo + Bixo — tn-2,0/26 EE |: + а 


п viet (xi — x)? 







A ig n+1 — хо)? 
Bo + Bixo + tn—2,0/2 © i | п ш | 


аа, ———<— + PT ee 
n-2| n Mao — x)? 

is a (1 — a)-level confidence interval for Yo = Во + Вухо + =, that is, for the estimated 
value Yo of Y at xo. 


Remark 4. Тһе simple regression model (2) considered above can be general- 
ized in many directions. Thus we may consider EY as a polynomial in x of a degree 
higher than 1, or we may regard EY as a function of several variables. Some of these 
generalizations will be taken up in the problems. 


Remark 5. Let (X1, Yı), (X2, Y2), ... , (Xn, Yn) be a sample from a bivariate 
normal population with parameters EX = ші, EY = u2, var(X) = o2, var(Y) — 
оў, and cov(X, Y) = p. In Section 7.7 we computed the PDF of the sample correla- 
tion coefficient R and showed (Remark 7.7.4) that the statistic 


(28) THR 


has a t (n — 2) distribution, provided that р = 0. If we wish to test p = 0, that is, the 
independence of two jointly distributed normal RVs, we can base a test on the statis- 
tic Т. Essentially, we are testing that the population covariance is 0, which implies 
that the population regression coefficients are 0. Thus we are testing, in particular, 
that 6; = 0. It is therefore not surprising that (28) is identical with (10). We empha- 
size that we derived (28) for a bivariate normal population, but (10) was derived by 
taking the X’s as fixed and the distribution of Y’s as normal. Note that for a bivariate 
normal population, E(Y | x} = u2 + p(02/o01)(x — и) is linear, consistent with our 
model (1) or (2). 


Example 1. Let us assume that the following data satisfy a linear regression 
model: 


576 GENERAL LINEAR HYPOTHESIS 


Yi = fo + Bix; t Ei. 
x 0 1 2 3 4 5 
y 0.475 1.007 0.838 —0.618 1.378 0.943 


Let us test the null hypothesis that В = 0. We have 


5 
х=25  JO'G-x?)-15 y=0.671, 
i= 
5 
У — Oi — У) = 0.9985, 
ї=0 
Ё = 0.0571, By =F — fix = 0.5279, 
5 
Ух — Bo — ĝi xi} = 2.3571, 
і=0 
апа 
ди. SH 9220: =D" _ 0.3106. 
Уо - Ё – Bi xi)? 


Since tn—2,0/2 = t4,0.025 = 2.776 > 0.3106, we accept Ho at level a = 0.05. 
Let us next find a 95 percent confidence interval for E{Y | x = 7}. This is given 
by (26). We have 


ys n 1 (X — хо)? EE E +з) 
hz — | ~ = 2.776,/ ——— 
maat lS s 7 as) 


= 2.3707, 
бо + Bixo = 0.5279 + 0.0571 x 7 
= 0.9276, 





so that the 95 percent confidence interval is (— 1.4431, 3.2983). 

(The data were produced from Table ST6, random numbers with u = 0,0 = 1, 
by letting Во = 1 and В = 0 so that E[Y | x} = Bo + Bix = 1, which surely lies in 
the interval.) 


PROBLEMS 12.3 


1. Prove statements (4), (5), and (6). 
2. Prove statements (7) and (8). 


3. Prove statement (11). 
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4. Obtain tests of null hypotheses Во = 60, 81 = f, and (Bo, 61)’ = (Bp, By)’, 
where Во, f, are given real numbers. 


5. Obtain the confidence intervals for Во and f, as given in (20) and (21), respec- 
tively. 


6. Derive the expression for var(E {Y | xo}) as given in (24). 


7. Show that the interval given in (27) is a (1 — o)-level confidence interval for 
Yo = Bo + B1xo + £, the estimated value of Y at хо. 


8. Suppose that the regression of Y on the (mathematical) variable x is a quadratic 
Y; = Во + fixi + Box? + &, 


where Во, 61, P? are unknown parameters, x1, x2, ... , Xn are known values of x, 
and £1, £2, ... , En are unobservable RVs that are assumed to be independently 
normally distributed with common mean 0 and common variance о? (see Ex- 
ample 12.2.3). Assume that the coefficient vectors Gf, xk E xt), k —0,1,2, 
are linearly independent. Write the normal equations for estimating the 6’s and 
derive the generalized likelihood ratio test of f» = 0. 


9, Suppose that the Y's can be written as 
Y; = Віхи + 242 + Взхіз + £i, 


where х;у, х;2, xj3 аге three mathematical variables, and ғ; are iid Л/(0, 1) RVs. 
Assuming that the matrix X (see Example 12.2.3) is of full rank, write the normal 
equations and derive the likelihood ratio test of the null hypothesis Ho: бү = 


Bo = Вз. 


10. The following table gives the weight Y (grams) of a crystal suspended in a satu- 
rated solution against the time suspended T (days). 


Time, T 0 1 2 3 4 5 6 
Weight, Y |04 0.7 11 16 19 23 26 


(a) Find the linear regression line of Y on T. 

(b) Test the hypothesis that Во = 0 in the linear regression model Y; = Bo + 
ВІТ, + еї. 

(с) Obtain а 0.95 level confidence interval for Во. 


124 ONE-WAY ANALYSIS OF VARIANCE 


In this section we return to the problem of one-way analysis of variance considered 
in Examples 12.2.1 and 12.2.4. Consider the model 


(1) Yij = Шш t &ij. j22,..;h к= Эу ЕЁ; 
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as described in Example 12.2.4. In matrix notation we write 

(2) Ү= ХВ +e, 

where 


Y = Ope ое , Yini» Y21, Ye --- ‚ Үзл,,..., Уку, Sie fe 
В = (41, ио... as 


In, 0... 0 
Х=[ eG Rd 
00... 1, 
апі 
ГА 
Е = (Е11,812,..., Elng» E21, E22, ... , &2п), +++ s Ekls ЕК2, «+» s Ekng) - 


As in Example 12.2.4, Y is a vector of n-observations (n = 3 ni), Whose com- 
ponents Ү;; are subject to random error £j; ~ МО, o?), B is a vector of k unknown 
parameters, and X is a design matrix. We wish to find a test of Ho: шу = из = 

-- = пк against all alternatives. We may write Но in the form HB = 0, where Н is 
a (k — 1) x k matrix of rank (k — 1), which can be chosen to be 


1 —1 0 ee 0 
1 0 ~i ee 0 
Н = 
1 0 0 э =] 
Let us write ил = u2 = --- uy = и under Ho. The joint PDF of Y is given by 
5 1 n/2 
O fri sia... 1. 0 = (3) exp | -z7 ou ш)? |, 


and under Но by 


1 ual" Е k ni 
It is easy to check that the MLEs are 


nj 
m 
(5) jig ED у i=1,2,...,k, 


nj 
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k i = 
ip bia La o Уг)” 


(6) б = 2, 
n 
à Ei D Nj _ 
(7) и = а =) 
and 
(8) & а a У 10у = у)? 
тл тз) 
By Theorem 12.2.1, the likelihood ratio test is to reject Но if 
Xia Dja Yy -Y?- Dia je hy — Vi? n- 
(9) SS ane See КОТ ts Fo, 
i=l Zj Yi EE Yi.) ~ 


where Fo is the upper o percent point in the F(k — 1, n — К) distribution. Since 


(10) Y у о, -Yy- yw —¥;.+¥;.-Yy 


ixl j= i=l j=l 
k 
-D-7 +m. -Yy, 
i=l j=l i=l 


we may rewrite (9) as 


Урта. -YVP/A-D р 

k у Р) k) 
It is usual to call the sum of squares in the numerator of (11) ће between sum of 
squares (BSS), and the sum of squares in the denominator of (11) the within sum 


of squares (WSS). The results are conveniently displayed in an analysis of variance 
table in the following form: 


da) 


One-Way Analysis of Variance 


Source of Degrees of Mean Sum 
Variation Sum of Squares Freedom of Squares F-Ratio 
k 
= = BSS/(k — 1) 
Bet BSS = iQ. — У)? k-1 BSS/(k — 1 „ию 
etween an ( ) /( ) WSS/(n - 
Within WSS = * Yu, ӯ.) п – К WSS/(n — k) 
i=| j=! 
Mean nY 1 


Total TSS = Y Yu n 


i=l j= 
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The third row, “Mean,” has been included to make the total of the second column 
add up to the total sum of squares (TSS), Y , b» i Y?. 


Example 1. The lifetimes (in hours) of samples from three different brands of 
batteries, Уу, Y2, and Y3, were recorded, with the following results: 


Yı Y) Y; 
40 60 60 
30 40 50 
50 55 70 
50 65 65 
30 75 

40 





We wish to test whether the three brands have different average lifetimes. We will as- 
sume that the three samples come from normal populations with common (unknown) 
standard deviation о. 

From the data nı = 5, n2 = 4, пз = 6, п = 15, and 


2 220 360 
HM, = р oss уз = 200 60, 


уз 4 6 


5 4 6 
Уон - 7)? = 400, Уу – 72)? = 350, У (уз — 3? = 850. 


i= i=l i=l 
Also, the grand mean is 


200 +220 +360 _ 780 


= 52. 
15 15 





y = 
Thus 


BSS = 5(40 — 52)? + 4(55 — 52)? + 6(60 — 52)? 
= 1140 


and 


WSS = 400 + 350 + 850 = 1600. 


Analysis of Variance 
Source SS df. MSS F-Ratio 
Between 1140 2 570 570/133.33 = 4.28 


Within 1600 12 13333 
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Choosing a = 0.05, we see that Fo = F2,12,0.05 = 3.89. Thus we reject Но: u1 = 
H2 = из at level а = 0.05. 


Example 2. Three sections of the same elementary statistics course were taught 
by three instructors, I, II, and HI. The final grades of students were recorded as fol- 
lows: 


I II Ш 
95 88 68 
33 78 79 
48 91 91 
76 51 71 
89 85 87 
82 77 68 
60 31 79 
77 62 16 
96 35 
81 


Let us test the hypothesis that the average grades given by the three instructors аге 
the same at level a = 0.05. 

From the data n; = 8, n2 = 10, пз = 9, n = 27, y, = 70, y; = 74, ўз = 66, 
У aou — 71)? = 3168, LIL Ox — F2)? = 3686, У? (уз — Уз)? = 4898. 
Also, the grand mean is 


60 + 74 4 
560 + 740 + 59 21894 015. 


Qus 27 27 


BSS = 8(0.15)? + 10(3.85)? + 9(4.15)? = 303.4075 


and 


WSS = 3168 + 3686 + 4898 = 11,752. 


Analysis of Variance 
Source SS d.f. MSS F-Ratio 
Between 303.41 2 151.70 151.70/489.67 


Within 11,752.00 24 48967 








We therefore cannot reject the null hypothesis that the average grades given by 
the three instructors are the same. 
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PROBLEMS 12.4 


1. Prove statements (5), (6), (7), and (8). 


GENERAL LINEAR HYPOTHESIS 


2. The following are the coded values of the amounts of corn (in bushels per acre) 
obtained from four varieties, using unequal number of plots for the different 


varieties: 


CM ЖЕ: 


2, 1,3, 2 


3,4, 2,3,4, 2 


6,4, 8 
7,6, 7,4 


Test whether there is a significant difference between the yields of the varieties. 


3. A consumer interested in buying a new car has reduced his search to six different 
brands: D, F, G, P, V, T. He would like to buy the brand that gives the highest 
mileage per gallon of regular gasoline. One of his friends advises him that he 
should use some other method of selection, since the average mileages of the six 


brands are the same, and offers the following data in support of her assertion. 


Distance Traveled (Miles) per Gallon of Gasoline 


Brand 








Car D F G 
1 42 38 28 
2 35 33 32 
3 37 28 35 
4 37 37 
5 
6 





Should the consumer accept his friend’s advice? 





25 


24 


4. The following data give the ages of entering freshmen in independent random 


samples from three different universities, A, В, and С. 


A 





17 
19 
20 
21 
18 





B 


16 
16 
19 








21 
23 
22 
20 
19 
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Test the hypothesis that the average ages of entering freshman at these universi- 


ties are the same. 


5. Five cigarette manufacturers claim that their product has low tar content. Inde- 
pendent random samples of cigarettes are taken from each manufacturer and the 
following tar levels (in milligrams) are recorded. 


A 


чос ә 


Brand Таг Level (mg) 


4.2, 4.8, 4.6, 4.0, 4.4 
4.9, 4.8, 4.7, 5.0, 4.9, 5.2 
5.4, 5.3, 5.4, 5.2, 5.5 


5.8, 5.6, 5.5, 5.4, 5.6, 5.8 
5.9, 6.2, 6.2, 6.8, 6.4, 6.3 


Can the differences among the sample means be attributed to chance? 


6. The quantity of oxygen dissolved in water is used as a measure of water pollu- 
tion. Samples are taken at four locations in a lake and the quantity of dissolved 
oxygen is recorded as follows (lower reading corresponds to greater pollution): 





Location 


A 


B 
С 
р 


Quantity of Dissolved Oxygen (%) 
7.8, 6.4, 8.2, 6.9 

6.7, 6.8, 7.1, 6.9, 7.3 

7.2, 7.4, 6.9, 6.4, 6.5 

6.0, 7.4, 6.5, 6.9, 7.2, 6.8 





Do the data indicate a significant difference in the average amount of dissolved 
oxygen for the four locations? 


12.5 TWO-WAY ANALYSIS OF VARIANCE WITH 
ONE OBSERVATION PER CELL 


In many practical problems one is interested in investigating the effects of two fac- 
tors that influence an outcome. For example, the variety of grain and the type of 
fertilizer used both affect the yield of a plot; or the score on a standard examination 
is influenced by the size of the class and the instructor. 

Let us suppose that two factors affect the outcome of an experiment. Suppose also 
that one observation is available at each of a number of levels of these two factors. 
Let Y;jj(j = 1,2,...,a; j = 1,2,... , b) be the observation when the first factor is 
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at the ith level and the second factor at the jth level. Assume that 
(1) Yij =u + di + Bj + €ij, CH, 2.408. узар. .уж 12s Р: 


where о; is the effect of the ith level of the first factor, B; is the effect of the jth level 
of the second factor, and ¢;; is the random error, which is assumed to be normally dis- 
tributed with mean 0 and variance с2. We will assume that the £;;'s are independent. 
It follows that Y;; are independent normal RVs with means и + o; + Bj and vari- 
ance o”, There is no loss of generality in assuming that Уа = si Bj = 0, 
for if uij = W + or + pi, we can write 


шу = (и + +В) + (oi – a) + (В, — B) 
= poi В; 
and У ро; = 0, 33 В; = 0. Here we have written & and f for the means of 
a;’s and f^'s, respectively. Thus Y;; may denote the yield from use of the ith variety 
of some grain and the jth type of some fertilizer. The two hypotheses of interest are 
ay =02=-:--=a,=0 and fi = 8 =... = Вь = 0. 


The first of these, for example, says that the first factor has no effect on the outcome 
of the experiment. 


In view of the fact that 3? уо; = 0 and 3, bj = 0, аа = — x а, 
Вь = — 253 Bj, and we can write our model in matrix notation as 
Q) Y — ХВ +e, 
where 


Y = (Yi, Yio. --- , Yaw, You, Yo... » Ёзь,... Yay Yor, --- , Yao)’, 
В = (и, a1, 02,... , @a—1, Ву, В2,... » Bo-1)', 


ГА 
Е = (£11, 612,... , Elb, E21, &22,... , E2b, - - - , Eal, 8а2,... , Eab) » 


and 
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The vector of unknown parameters B is (a + b — 1) x 1, and the matrix X ts 
ab x (a + b — 1) (b blocks of a rows each). We leave the reader to check that 
X is of full rank, a + b — 1. The hypothesis H4: ој = a2 = --: = ол = бог 
Нв: Ву = B2 = ··· = Bp = 0 сап easily be put into the form НВ = 0. For example, 
for Hg we can choose Н to be the (b — 1) x (a + b — 1) matrix of full rank b — 1, 
given by 





Clearly, the model described above is a special case of the general linear hypothesis, 
and we can use Theorem 12.2.1 to test Hg. 
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To apply Theorem 12.2.1, we need the estimators jz;; and Îi j- Itis easily checked 
that 


Mi S odi Уу 
3 П = ————-y 
(3) А zh y 
and 
(4) &-y.-»  Bj-Yj-)». 


where y;. — Уа yij/b, У-у = Yi yij/a. Also, under Hg, for example, 


A 


(5) й= у and б; =J; — y. 

In the notation of Theorem 12.2.1, n = ab, k = a +b — 1,r = b — 1, so that 
n —k-—ab-—a-—b-F1- (a — 1)(Ь— 1), and 
Xi YE y = Yid? > Eia Lja y -Yi -Yj +P 


(6) F= 
XX; NEA - Ү.; + Ү)2 





Since 


a b a b 
о УУ -Yi-91qm-Yi.-Y;-Y)- q.; - Y 


i=l j=! i=l j=l 
a b 


=} у Оо у-УҮ,. туу аууу -7 


i=l j=l 


we may write 


a, T.j- У) 


8 nn ee a ee ЫР RE 
A Yn XL – Р. У. + Р) 
It follows that under Hg, (a — 1) F has a central F(b — 1, (a — 1)(b — 1)) distribution. 
The numerator of F in (8) measures the variability between the means Y. j and 
the denominator measures the variability that exists once the effects due to the two 
factors have been subtracted. 
If Ha is the null hypothesis to be tested, one can show that under Hy the MLEs 
are 


» 


D» 
<] 
ч. 
æl 


(9) й=ў and 


As before, n = ab, k =a + b — 1, but r = a — 1. Also, 
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i=l YS My =¥;, -Y.j +Y) 


, 


(10) F= 


which may be rewritten as 


du ва. Y? 
i У» 03у — Ү;. — Ү.; +Y) 
It follows that under H,, (b — 1)F has a central F(a — 1, (a — 1)(b — 1)) distri- 
bution. The numerator of F in (11) measures the variability between the means Y; . 
If the data are put into the following form: 








Level of factor 2 
В| 1 2 b | Row mean 
a 
1 | Yu, Y co Y» Yi 
Level 2 You, Yo, -, Yo Y2. 
of E » А ә 2 
factor 1 
а Yo, Ya2, TO Yab Yz: 
Column mean | Ул, Уо, -:-, Yp Y 


so that the rows represent various levels of factor 1, and the columns, the levels of 
factor 2, one can write 


a 
between sum of squares for rows = b LE.. -yy 
i=l 


= sum of squares for factor 1 


= $81. 


Similarly, 


b 
between sum of squares for columns = a УФ. j^ ү)? 
j=l 


= sum of squares for factor 2 
= SS. 
It is usual to write error or residual sum of squares (SSE) for the denominator of (8) 


or (11). These results are conveniently presented in an analysis of variance table as 
follows: 
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Two-Way Analysis of Variance Table with One Observation per Cell 





Source of Sum of Degrees of Mean 
Variation Squares Freedom Square F-Ratio 
Rows SS; a-1 MS, = SS;/(a — 1) MS, /MSE 
Columns SS; b-J MS; = SS;/(b — 1) MS;/MSE 
Error SSE (a—-1)(b—1)  MSE-SSE/(a — 1)(b — 1) 
Mean aby” 1 abY? 

a b a b 

Tta — Y Y ab Y XX 
i=l j=l iml j=l 


Example 1. The following table gives the yield (pounds per plot) of three vari- 
eties of wheat, obtained with four different kinds of fertilizers. 





Variety of Wheat 
Fertilizer A B C 
a 3 7 
p 10 4 8 
y 6 5 6 
ó 8 4 7 


Let us test the hypothesis of equality in the average yields of the three varieties of 
wheat and the null hypothesis that the four fertilizers are equally effective. 
In our notation, b = 3, a = 4, yj. = 6, уу. = 7.33, уз. = 5.67, уд. = 6.33, 
ўл = 8, 7.2 = 4, у.з = 7, у = 6.33. 
Also, 
SS, = sum of squares due to fertilizer 
= 3[(.33)? + 1? + (0.66)? + 0?] 
= 4.67; 
SS? = sum of squares due to variety of wheat 
= 4[(1.67)? + (2.33)? + (0.67)?] 
= 34.67 


апа 


4 3 
55Е = УУ (уу ў. 7.5 + 


і=1 j=l 
= 7.33 
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The results are shown in the following table: 











Analysis of Variance 
Source SS df MS Е-Вайо 
Variety of wheat 34.67 2 17.33 14.2 
Fertilizer 4.67 3 1.56 1.28 
Error 7.33 6 1.22 
Mean 481.33 1 481.33 
Total 528.00 12 44.00 


Now F?,6,0.05 = 5.14 and F3,6,0.05 = 4.76. Since 14.2 > 5.14, we reject Hg, that 
there is equality in the average yield of the three varieties; but since 1.28 4 4.76, we 
accept Ha, that the four fertilizers are equally effective. 


PROBLEMS 12.5 


1. Show that the matrix X for the model defined in (2) is of full rank, a + b — 1. 
2. Prove statements (3), (4), (5), and (9). 


3. The following data represent the units of production per day turned out by four 
different brands of machines used by four machinists: 





Machinist 
Machine A, А» Аз А4 
В! 15 14 19 18 
В, 17 12 20 16 
B; 16 18 16 17 
B4 16 16 15 15 





Test whether the differences in the performances of the machinists are signifi- 
cant and also whether the differences in the performances of the four brands of 
machines are significant. Use a = 0.05. 


4. Students were classified into four ability groups, and three different teaching 
methods were employed. The following table gives the mean for four groups: 
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Teaching Method 
Ability 
Group A B С 
1 15 19 14 
2 18 17 12 
3 22 25 17 
4 17 21 19 


Test the hypothesis that the teaching methods yield the same results, that is, that 
the teaching methods are equally effective. 


5. The following table shows the yield (pounds per plot) of four varieties of wheat 
obtained with three different kinds of fertilizers. 











Variety of Wheat 
Fertilizer A B С р 
a 8 3 6 7 
B 10 4 5 8 
y 8 4 6 7 





Test the hypotheses that the four varieties of wheat yield the same average yield 
and that the three fertilizers are equally effective. 


12.6 TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 


The model described in Section 12.5 assumes that the two factors act independently, 
that is, are additive. In practice, this is an assumption that needs testing. In this sec- 
tion we allow for the possibility that the two factors might jointly affect the outcome; 
that is, there might be interactions. More precisely, if Y;; is the observation in the 
(i, j)th cell, we will consider the model 


(1) Yij = w+ о + Bj + у + ё), 
where 0; (i = 1, 2,... , a) represent row effects (or effects due to factor 1), Bj(j = 
1,2, ... , b) represent column effects (or effects due to factor 2), and yj; repre- 


sent interactions or joint effects. We assume that ¢;; are independently distributed 
as N (0, 07). We assume further that 


b 
у \иу=0 foralli, 


b 
(2) Sasis o and 7 
i=l j=l 


i 


- 


yi; = 0 forall j. 
1 
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The hypothesis of interest is 


(3) Ho: уу =0 for all i, j. 


One may also be interested in testing that all a’s аге O or that all 8° are O in the 
presence of interactions у;;. 
We first note that (2) is not restrictive since we can write 


Yij = и +0; + В; t+ yij + eij, 
where о, m and Vij do not satisfy (2), as 
уу=ш+@ +В +y +; -u +. У) (8, P +75 -7) 
HY Yi У БУ) + Eijs 
and then (2) is satisfied by choosing 
wap +e +В +7, 
о =a; -€ +. y, 


bi =B; -P 4Y5-Y. 


and 
Vij = Мр ~V~ VGA 
Here 
a Dr. b b 
пшр? ilz pcr 
i=l] j=! j=i 


a a b 
yj = ат) 5 yj and y= (ab)! D УЗ Vij- 
i=] 


i=l jet 


Next note that unless we replicate, that is, take more than one observation per cell, 
there are no degrees of freedom left to estimate the error SS (see Remark 1). 

Let Y;js be the sth observation when the first factor is at the ith level and the 
second factor at the jth level, i = 1,2,...,a, j = 1,2,...,b,s = 1,2,..., 
m(> 1). Then the model becomes as follows: 
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Level of Factor 2 








Level of 
Factor 1 1 2 65 b 
1 yin ybi ee Уы 
Yim Ут NUS Ут 
2 yi yn "e Уы 
Улт Ут xu Y2bm 
a Уап Yazı A Yabi 
Yaim Ya2m ves Yabm 
(4) Yi, = и + aj + В; + ур + у, 
i=1,2,...,a,j=1,2,...,b,ands = 1,2,... , m, where ¢;;,’s are independent 
b 
A (0, a7). We assume that 3? о; = Y, a Bj = Ууу = jan = 0. 
Suppose that we wish to test Hy: ол = a2 = --- = Œa = 0. We leave the reader 


to check that model (4) is then a special case of the general linear hypothesis with 
п = арт, К =ab,r =a — 1,andn — k = ab(m — 1). 


Let us write 
cc P E Kp. ccs rai Йу 
(5) Кг V Кшт 
b 
y, = £i Y Ys ӯ. _ 2i OMA 
рш Г : os ; 
m am 


Then it can be easily checked that 


(6) 
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It follows from Theorem 12.2.1 that 


Xi L; ds QOis X Yi. tYi. —-Yy- Di 24 Ys (ijs 7 Үә? 
2 25 (ijs Ai Yi. y ` 





(7 F= 


Since 
$3 УО» —Yij.+Yi.. -yy 
i j S 
EEL- LDLT- 
i j 8$ i j s 


we can write (7) as 


bm У, (Ү;.. — YY 
Yj Eijs — Їр? 


Under H, the statistic [ab(m — 1)/(a — 1)]F has the central F(a — 1, ab(m — 1)) 
distribution, so that the likelihood ratio test rejects Hy if 


(8) Е = 


ат - 1)  mbYyu(Y;.— Y? 3 
a-l Ej Eijs — Yy)? 


A similar analysis holds for testing Hg: B1 = f2 = --- = Bp. 

Next consider the test of hypothesis Ну: yjj = 0 for all i, j, that is, that the two 
factors are independent and the effects are additive. In this case, n = abm, k = ab, 
r = (a — 1)(Ь — 1), and n — k = ab(m — 1). It can be shown that 


(9) 


(10) Á-Y, à -Yi.-Y, and Bj -Y.j.- Y. 

Thus 

ap p EX Este - Ye - Y Y? - XY Yu 08 - Yu 
У X; Vs Nis - Yi? 

Now 


Y s — Yi. -Y..4YY 
i Jj 8 
= Led iu - Yij. + Yij. Yi. —У.у. Y)? 
-252,2 ie Yi. PESE ies Yi..—-Y.j. 4 YY, 
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so that we may write 





3; Xj X Gi. - Yi. —Ё.у.+Ү)? 
25/225 2, Ug = ҮҢӨ? | 


Under Ну, the statistic (6j — 1)ab/[(a — 1)(b — Dj) has the F((a — 1)( — 1), 
ab(m — 1)) distribution. The likelihood ratio test rejects Hy if 


(12) Е = 


(m — 1)аЬ m) Lj Vij. — Yi. Ў. Y) 
(13) ee, 
(а — D(b — 1) УУ; Les (ijs — Y;;.)? 


Let us write 


SS, = sum of squares due to factor 1 (row sum of squares) 
a 
= т Y (ї.—)?, 
i=l 
SS? = sum of squares due to factor 2 (column sum of squares) 
b — — 
= ат )(¥.;.—Y¥)*, 
j=l 
SSI = sum of squares due to interaction 
-nY Y F- Yi. —- Y. j.- Yy, 
i=l j= 
and 
SSE — sum of squares due to error (residual sum of squares) 
= Ууу -F zs Yi.) 
i=] j=1 s=1 
Then we may summarize the foregoing results in the following table. 


Two-Way Analysis of Variance Table with Interaction 





Source of Sum of Degrees of 

Variation Squares Freedom Mean Square F-Ratio 
Rows SS; а—1 MS, = SS;/(a — 1) MS, /MSE 
Columns SS; b-—1 MS; = SS2/(b — 1) MS;/MSE 
Interaction SSI (a—1)b—1) MSI-SSI/(a — 1)(b – 1) MSI/MSE 
Error SSE ab(m — 1) MSE = SSE/ab(m — 1) 

Mean ат 1 abn X. 


m 


Total Y Y 225 jjs abm У Y Y Y? /abm 


i=] j=l з=1 i=} j=1 s=1 
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Remark I. Note that if m = 1, there are no @.Ї.'$ associated with the SSE. in- 
deed, SSE = 0 if т = 1. Hence we cannot make tests of hypotheses when m = 1, 
and for this reason we assume that m > 1. 


Example I. To test the effectiveness of three different teaching methods, three 
instructors were randomly assigned 12 students each. The students were then ran- 
domly assigned to the different teaching methods and were taught exactly the same 
material. At the conclusion of the experiment, identical examinations were given to 
the students with the following results in regard to grades: 














Instructor 
Teaching 

Method I П ш 
1 95 60 86 

85 90 77 

74 80 75 

24 70 70 

2 90 89 83 

80 90 70 

92 91 75 

82 86 72 

3 70 68 74 

80 73 86 

85 78 91 

85 93 89 


From the data the table of means is as follows: 


80.7 





Then 
SS, = sum of squares due to methods 
a 
— bm o. -» 
i=] 


=3x4x 14.13 = 169.56, 
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SS; = sum of squares due to instructors 


b 
=ат} G.j. — 3 


j=! 
= 3 x 4 x 6.86 = 82.32, 


SSI = sum of squares due to interaction 


35.3 


i=l j=] 


and 


SSE = residual sum of squares 





3 3 4 
- 3260s — X? = 1830.00. 
i=l j=l s=1 
Analysis of Variance 

Source 55 d.f. MSS F-Ratio 
Methods 169.56 2 84.78 125 
Instructors 82.32 2 41.16 0.61 
Interactions 561.80 4 140.45 2.07 
Error 1830.00 21 67.78 


With a = 0.05, we see from the tables that Р 270.05 = 3.35 and F4,27,005 = 
2.73, so that we cannot reject any of the three hypotheses that the three methods 
are equally effective, that the three instructors are equally effective, and that the 
interactions are all 0. 


PROBLEMS 12.6 


‚ 1. Prove statement (6). 


| 


2. Obtain the likelihood ratio test of the null hypothesis Hg: В = f2 = -:- 
Bp = 0. 


3. Prove statement (10). 


4. Suppose that the following data represent the units of production turned out each 
day by three different machinists, each working on the same machine for three 
different days: 
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Machinist 
Machine A B C 
B, 15, 15, 17 19, 19, 16 16, 18, 21 
B; 17, 17, 17 15, 15, 15 19, 22, 22 
B4 15, 17, 16 18, 17, 16 18, 18, 18 
B, 18, 20, 22 15, 16, 17 17, 17, 17 
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Using a 0.05 level of significance, test whether (a) the differences among the ma- 
chinists are significant, (b) the differences among the machines are significant, 
and (c) the interactions are significant. 


5. In an experiment to determine whether four different makes of automobiles av- 
erage the same gasoline mileage, a random sample of two cars of each make was 
taken from each of four cities. Each car was then test run on 5 gallons of gasoline 
of the same brand. The following table gives the number of miles traveled. 


Automobile Make 
City B С 
Cleveland 92.3, 104.1 90.4, 103.8 110.2, 115.0 
Detroit 96.2, 98.6 91.8, 100.4 1123, 111.7 
San Francisco 90.8, 96.2 90.3, 89.1 107.2, 103.8 
Denver 98.5, 97.3 96.8, 98.8 115.2, 110.2 





D 


120.0, 125.4 
124.1, 121.1 
118.4, 115.6 
126.2, 120.4 


Construct the analysis of variance table. Test the hypothesis of no automobile 
effect, no city effect, and no interactions. Use о = 0.05. 


СНАРТЕК 13 


Nonparametric Statistical Inference 


13.1 INTRODUCTION 


In all the problems of statistical inference considered so far, we assumed that the 
distribution of the random variable being sampled is known except, perhaps, for 
some parameters. In practice, however, the functional form of the distribution is sel- 
dom, if ever, known. It is therefore desirable to devise methods that are free of this 
assumption concerning distribution. In this chapter we study some procedures that 
are commonly referred to as distribution-free or nonparametric methods. The term : 
distribution-free refers to the fact that no assumptions are made about the under- 
lying distribution except that the distribution function being sampled is absolutely 
continuous. The term nonparametric refers to the fact that there are no parameters 
involved in the traditional sense of the term parameter used thus far. To be sure, 
there is a parameter that indexes the family of absolutely continuous DFs, but it is 
not numerical, and hence the parameter set cannot be represented as a subset of Ry, 
for any л > 1. The restriction to absolutely continuous distribution functions is a 
simplifying assumption that allows us to use the probability integral transformation 
(Theorem 5.3.1) and the fact that ties occur with probability 0, 

Section 13.2 is devoted to the problem of unbiased (nonparametric) estimation. 
We develop the theory of U-statistics since many estimators and test statistics may 
be viewed as U-statistics. Sections 13.3 through 13.5 deal with some common 
hypothesis-testing problems. In Section 13.6 we investigate applications of order 
statistics in nonparametric methods. Section 13.7 considers underlying assumptions | 
in some common parametric problems and the effect of relaxing these assumptions. 


13.2 U-STATISTICS 


In Chapter 7 we encountered several nonparametric estimators. For example, the em- 
pirical DF defined in Section 7.3 as an estimator of the population DF is distribution 
free, and so also are the sample moments as estimators of the population moments. 
These are examples of what are known as U -statistics, which lead to unbiased esti- 
mators of population characteristics. In this section we study the general theory of 
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U-statistics. Although the thrust of this investigation is unbiased estimation, many 
of the U-statistics defined in this section may be used as test statistics. 

Let X1, X2,... , Xn be iid RVs with common law £(X), and let P be the class of 
all possible distributions of X that consists of the absolutely continuous or discrete 
distributions, or subclasses of these. 


Definition 1. A statistic T (X) is sufficient for the family of distributions P if the 
conditional distribution of X, given Т = t, is the same whatever the true F € P. 


Example I. Let X1, X2, ... , Xn be a random sample from an absolutely contin- 
uous DF, and let T = (Хт), ... , X{n)) be the order statistic. Then 


fx|T-t-(9)'!, 


and we see that T is sufficient for the family of absolutely continuous distributions 
on R. 


Definition 2. A family of distributions P is complete if the only unbiased esti- 
mator of O is the zero function itself, that is, 


Egh(X) = 0 forall F €e P > h(x) = 0 
for all x (except for a null set with respect to each F є P). 


Definition 3. A statistic T(X) is said to be complete in relation to a class of 
distributions P if the class of induced distributions of T is complete. 


We have already encountered many examples of complete statistics or complete 
families of distributions in Chapter 8. 


The following result is stated without proof. For the proof we refer to Fraser [29, 
pp. 27-30, 139-142]. 


Theorem 1. The order statistic (X(1), X(2),... , X~)) is a complete sufficient 
statistic provided that the iid RVs X1, X2,... , Xn are of either the discrete or con- 
tinuous type. 


Definition 4. A real-valued parameter g(F) is said to be estimable if it has an 
unbiased estimator, that is, if there exists a statistic T (X) such that 


(1) ErT(X) = g(F) for all F € P. 


Example 2. If P is the class of all distributions for which the second moment 
exists, X is an unbiased estimator of и(Е), the population mean. Similarly, u2(F) = 
varr (X) is also estimable, and an unbiased estimator is 52 = У(Х; — Х)?/(л — 
1). We would like to know whether X and 52 are UMVUEs. Similarly, F(x) and 
Pr(X, + X2 > 0) are estimable for F є Р. 
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Definition 5. The degree m (m > 1) of an estimable parameter g(F) is the small- 
est sample size for which the parameter is estimable; that is, it is the smallest m such 
that there exists an unbiased estimator 7 (Ху, X2,... , Xm) with 


ЕЕТ = g(F) for all F € P. 


Example 3. The parameter g(F) = Pr(X > c}, where c is a known constant, 
has degree 1. Also, (F) is estimable with degree 1 [we assume that there is at least 
one F є P such that u(F) z 0], and 42(F) is estimable with degree m = 2, 
since 43 (F) cannot be estimated (unbiasedly) by one observation only. At least two 
observations are needed. Similarly, и2(Е ) has degree 2, and P(X; + X2 > 0) also 
is of degree 2. 


Definition 6. An unbiased estimator of a parameter based on the smallest sample 
size (equal to degree m) is called a kernel. 


Example 4. Clearly, X; 1 < i < nis a kernel of (Р); Т(Х;) = 1, if X; > c, 
and = Oif X; < cisakernal of P(X > c). Similarly, T(X;, Ху) = lif Xi -X; > 0, 
and = 0 otherwise is a kernel of P(X; + X; > 0), X; Xj is a kernel of u^ CF) and 
x? — Xj X; is a kernel of 2(F). 


Lemma 1. There exists a symmetric kernel for every estimable parameter. 


Proof. VW T(X1, X2,... , Xm) is a kernel of g(F), so also is 
1 
(2) T, Oti, X2... Xn) = —. DT, Acn ЖЫ; 


where the summation Р is over all т! permutations of (1,2, ... , m]. 
Example 5. A symmetric kernel for u2(F) is 
T, (Xi, Xj) = MT OG, Xj)  T(X j, Xi} 
= (ХХ), LíjeLl2...n£2j. 
Definition 7. Let g(F) be an estimable parameter of degree m, and let X4, X5, 


... , Xn be a sample of size n, n > m. Corresponding to апу kernel T(Xj,,... , Xi) 
of g(F), we define a U -statistic for the sample by 


—1 
(3) U(X), Хэ, Ж. Xn) = M no. Жк Xi). 
C 


where the summation C is over all ( ) combinations of m integers (74, i2,... , im) 
m 


chosen from (1,2, ... , n}, and Т; is the symmetric kernel defined in (2). 
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Clearly, the U-statistic defined in (3) is symmetric in the X;’s, and 
(4) ErU(X) = g(F) for all F. 


Moreover, U (X) is a function of the complete sufficient statistic X (1), X2), ... , Хоа). 
It follows from Theorem 8.4.6 that it is UMVUE of its expected value. 


Example 6. For estimating (F), the U-statistic is n^! Y Xi. For estimating 
L2 (F), a symmetric kernel is 


Т.Х, Хь) = dO — XH), — i = 1,2,..., n Gi s io), 


so that the corresponding U -statistic is 


ооо = (7) 2 1х — Xin)? 


i 2,2 


E xy 





= S*, 
Similarly, for estimating BF ), a symmetric kernel is 7,(X;,, Xi.) = Xj, Xi, and 


the corresponding U -statistic is 


i<j 


U(X) = [B] у; хх; = а 0 
) 


For estimating R(E), a symmetric kernel is T;(Xi,, X5, Xi) = Xi, Xi, Xi, 80 
that the corresponding U -statistic is 


1 
vœ- (^) YO XXX = SSEDGES у XiXjXe. 


i<j<k izjzk 


For estimating F(x) a symmetric kernel is /үх, «xj, so the corresponding U- 
statistic is 


1 n 
U(X) = x Y lixs] = FG) 
i=! 
and for estimating P(X > 0) the U-statistic is 


1 n 
U(X) = - xs = 1 — FI). 
i=] 
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Finally, for estimating P(X, + X2 > 0) the U-statistic is 


i<j 


U(X) = (9 у дх,+х,>0. 
І 


Theorem 2. The variance of the U-statistic defined in (3) is given by 


6) var U(X) = | 1 ) Э 





с=1 
т 
where 
bo = cove [Ts (Xi, +.» + Xin) o Ts (Xj -o X5,)] 
with m, the degree of g(F), and c is the common number of integers in the sets 
{i1,... , im} and {j1,..., jm}. [For с = 0, the two statistics T(X;,,... , Xim) and 
T(Xj,,..- , X jm) are independent and have zero covariance.] 


Proof. Clearly, 


m 

= ——— Y Er [D (Ха... Xin) - 8090] {Ts Q0 Xia) - 80]]- 
"io 

Let c be the number of common integers in (i1, i2, ... , im} and (jo, Jo. ... , Jm}- 

Then c takes values 0, 1,... , m and for c = 0, T;(Xi,,... , Xim) and Ts(X j ..., 


X jm) are independent. It follows that 


(6) var U(X) = ra al IC je 


which is (5). The counting argument from (6) to (7) is as follows: First we select 
integers {i}, ... , im} from (1,2, ... , n] in И ways. Next we select the integers in 


{j1,.-- » jm}. This is done by selecting first the c integers that will be in (i1, ... , im} 
(hence common to both sets) and then the m — c integers from n — m integers which 
will not be {j1,... , jm}. Note that £o = 0 from independence. 


Example 7. Consider the U-statistic estimator X of g(F ) = u(F)in Example 6. 
Here m = 1, T(x) = x, and бу = var( X) = с? so that var(X) = о?/п. 

For the parameter g(F) = 2(F), U(X) = S?. In this case, m = 2, Ts (Xi, Хь) = 
(Xi, = Хь)?/2, so 
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1 
var U (X) = ——(2(n — 2) + %2}, 


C) 


1 4 ua + o* 
Q= ЕЕ E (Xi, — Xi) -o| = aoe 


where 


and 
1 2 1 2 
б = cov i (Xi, = Xin) 33 (Xi, ~ Xj) |. 
where i? Æ j2. Then 
4 


H4 —8 


ü 4 


and 





es __ 74 4 
Vac p OO Sane) es^ Ё 2)(и4 Эште 


n(n — 1) 2 2 


_( п—3 ‘) 
= — | M4 — с}, 
п п—1 


which agrees with Corollary 2 to Theorem 7.3.5. 





For the parameter g(F) = F(x), var U(X) = F(x)(1— F(x))/n, and for g(F) = 
Pr(X; + X2 > 0), 


1 
var U (X) = nn p — 2) + 20), 


where 
ti = Pr(X, + X2 > 0, Ху + Хз > 0) — P2(X4 + X2 > 0) 
апа 
to = Pr(X1 + X2 > 0)— PEG + X2 > 0) 


= Р(Х + Хә > O)Pr(X1 + Хә < 0). 


Corollary to Theorem 2. Let U be the U-statistic for a symmetric kernel T, (X, X2, 
... 5 Xm). Suppose that Er[T;(X1,... , Х„)] < оо. Then 


(7) „нт {л var U (X)] = т? є. 
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Proof. 1 15 easily shown that O < ё. < ¿m for 1 < c < т. It follows from the 
hypothesis £4 = var[T,(X1,... , Xm)]* < oo and (5) that var U (X) « oo. Now 


oles 
т с) т-с (m!)2n [n — m) 


(^) be = ст — c)? n! (n — 2m + с)!” 


T 0 (mÈ (n—myn-m-D--(n-2m4c4 1) 
= т -om- n(n—1)---(n—m+1) © 


Note that the numerator has m — c + 1 factors involving п, while the denominator 
has m such factors so that for c > 1, the ratio involving п goes to zero as n — oo. 
For c — 1, this ratio — 1 and 


(mn? 


ETT, 
Io Dp ^ či 


n var U (X) — 
asn — oo. 


Example 8. In Example 7, n var(X) = o? and 


n уаг(52) — 22&\ = рд — c^ 


as п — oo. 


Finally, we state, without proof, the following result due to Hoeffding [42], which 
establishes the asymptotic normality of a suitably centered and normed U -statistic. 
For proof we refer to Lehmann [59, pp. 364—365] or Randles and Wolfe [83, p. 82]. 


Theorem 3. Let X1, X2, ... , Xn bea random sample from a DF F and let g(F) 
be an estimable parameter of degree m with symmetric kernel T;(X1, X2,... , Xm). 
If Er (T5 (X1, X2, ... , Х„)} < oo and U is ће U-statistic for g [as defined in 


(3)1, then VA(U (X) — g(F)) > N (0, m2¢1), provided that 


b = cove [75 (Xi... Xim)» Ts (Xj... > Xin) > 0. 


In view of the corollary to Theorem 2, it follows that [U —g(F)]/ /var(U) —> N (0, 1 
provided that у > 0. 


Example 9. (Example 7 continued). Clearly, Vn (X — u)/o > N(0, 1) asn > 
оо since {| = o? > 0. 
For the parameter g(F) = 2(F), 





1 —3 _ „4 
var U(X) = заб?) = 7 (иа — 7 vot), oe H4 —c 2: 
n = ——— 
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so it follows from Theorem 3 that 
nS? — o?) > NO, иќ — o^). 


The concept of U-statistic can be extended to multiple random samples. We will 
restrict ourselves to the case of two samples. Let X1, X2, ... , Xn, and Y1, Yo, ... , Yn, 
be two independent random samples from DFs F and G, respectively. 


Definition 8. A parameter g(F, С) is estimable of degrees (m, m2) if m, and m2 
are the smallest sample sizes for which there exists a statistic T(X1,... , Хь; Yi, 
‚ » my) such that 


(8) Eg,GT (Х\,.... Xm Yr... Y) = 8(F, G) 
for all F, С € P. 


The statistic T in Definition 8 is called a kernel of g and a symmetrized version 
of T, Т;, is called a symmetric kernel of р. Without loss of generality, therefore, we 
assume that the two-sample kernel T in (9) is a symmetric kernel. 


Definition 9. Let g(F, С), F,G € P be an estimable parameter of degree 
(ту, m2). Then a (two-sample) U -statistic estimate of g is defined by 


-1 —1 
(9) væn - (7) (^7) УУ TOi rese des н 


1 icA jeB 


where A and B are collections of all subsets of тү and m2 integers chosen without 
replacement from the sets (1, 2, ... , nj) and (1,2, ... , n2}, respectively. 


Example 10. Let X1, X2,...,Xn, and Yi, У,..., Ya, be two independent 
samples from DFs F and G respectively. Let 


(Е, G) = P(X < Y) = / F(x)g(x)dx = J PO > f dy, 
—oo —oo 


where f and g are the respective PDFs of F and G. Then 


py fl if Xi « Yj 
ra; 1 = |0 if X; > Y; 


is an unbiased estimator of g. Clearly, g has degree (1,1) and the two-sample 
U -statistic is given by 


ny no 


U(X; Y) = = 2:2. POY): 


i=l j=l 
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Theorem 4. The variance of the two-sample U-statistic defined in (9) is given by 


"m 1 А /miV/niy — ту\ (m2\ fno — тә 
varU (X; Y) — ec XXI c |. ТС) ree) 


m;J тә 
(10) 
where £;,4 is the covariance between Т(Хд,..., Xin, i Ё,,..., Vin, ) and T (Xk, 
d Хы; ЖОС Yen, ) with exactly c X’s and d Y’s in common. 
Corollary. Suppose that Er.gT7(X},... Xm Yn... Ym) < oo for all 


F,G € P. Let N = nj n2 and suppose that n1, n2, N — oo such that nj/N — А, 
n2/N — 1 — X. Then 


2 


m У y = + . 
1 1 а ) 1,0 1 ) 0,1 





The proofs of Theorem 4 and its corollary parallel those of Theorem 2 and its 
corollary and are left to the reader. 


Example 11. For the U-statistic in Example 10, 


1 
ErcU GG Y) = 75 5 2,2, 2, Екс (ТО: YDT Xr; YO]. 
rz 
Now 


Erg | T (Xj; Yj)T (Xr Yo)) = Р(Х < Yj, Xy < Yı) 


ffs, Е(х)в(х) dx fori = К, j=l, 
/ 0 -GGoPfG)dx гі =, j #1, 
| Е?(хув(х) dx fori #k, j = 1, 
[7° Ех) (х) ах] fori #k, j #1, 


where f апа g are PDFs of F and С, respectively. Moreover, 
B 2 
бо = f [1 — Сбх) f e) dx — feCF. С) 
—oo 
and 


бол = Í F?(X)g(x)] dx — [g(F, С)]?. 
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It follows that 


1 
var U(X; Y) = nmn; lgCF, OLU — g(F, G)) + (m1 — 0,0 + (12 — Dto]. 


In the special case when F = С, g(F, G) = 1, йо = бл = i = 4 = $ and 
var(U) = (nı + n2 + 1)/(12nin2). 


Finally, we state, without proof, the two-sample analog of Theorem 3, which es- 
tablishes the asymptotic normality of the two-sample U-statistic defined in (9). 


Theorem 5. Let Xj, X2, ... , Xn, and Yi, Y?, ... , Yn, be independent random 
samples from DFs F and G, respectively, and let g(F, G) be an estimable parameter 
of degree (m1, m2). Let T(X1,... , Xy i Yi, ... , Ym.) be a symmetric kernel for g 
such that ET? < co. Then 


Vni +m UU (X; Y) — (Е, G)} N0, 0°), 


where o? — m2tio/À + m2to,1/(1 — À), provided that с? > 0, ай0 < А = 
limy>oœ(mı/N) =A < 1, N = пу +72. 


In view of (12), we see that (U — g)//varU 2s AN (0, 1), provided that о? > 0. 


For a proof of Theorem 5 we refer to Lehmann [59, р. 364], ог Randles and 
Wolfe [83, р. 92]. 


Example 11. (Continued). In Example 11 we saw that in the special case when 
F=G6,%0=%1 = b, and var U = (пу + пә + 1)/(12njn2). It follows from the 
remark following Theorem 5 that 


U(X; Y)- 1 


cise a ME Алдуу. 
A/ G1 + п2 + 1)/(12n1n2) 


PROBLEMS 13.2 


1. Let (R, B, Р») be a probability space, and let Р = (Ро: 0 є Ө}. Let A bea 
Borel subset of R, and consider the parameter d(0) = Po(A). Is d estimable? 
If so, what is the degree? Find the UMVUE for d, based on a sample of size n, 
assuming that P is the class of all continuous distributions. 


2. Let X1, X2,... , Xm and Yi, Y2,... , Y, be independent random samples from 
two absolutely continuous DFs. Find the UMVUEs of (a) E(XY), and (b) 
var(X 4- Y). 


3. Let (X1, У), (X2, Y2), ... , (Xn, Yn) be a random sample from an absolutely 
continuous distribution. Find the UMVUEs of (a) E(XY) and (b) var(X + Y). 
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4. Let T(Xi1, X2,... , Xn) be a statistic that is symmetric in the observations. 
Show that T can be written as a function of the order statistic. Conversely, 
if T(X1, X2, ... , Xn) can be written as a function of the order statistic, Т is 


symmetric in the observations. 


5. Let X1, X2, ... , Xn be a random sample from an absolutely continuous DF Е, 
F є Р. Find U-statistics for g1(F) = и?(Е) and 22(F) = из(Е). Find the 
corresponding expressions for the variance of the U-statistic in each case. 


6. In Example 3, show that и2(Е) is not estimable with one observation. That is, 
show that the degree of u2(F) where F є Р, the class of all distributions with 
finite second moment, is two. 


7. Show that for c = 1, 2,...,т,0 < £c < £g. 
8. Let X1, X27, ... , X, bea random sample from an absolutely continuous DF F, 
F e P. Let 
g(F) = Er|X1 — X2l. 


Find the U-statistic estimator of g (F) and its variance. 


13.3 SOME SINGLE-SAMPLE PROBLEMS 


Let Xj, X2, ... , X, bea random sample from a DF F. In Section 13.2 we studied 
properties of U -statistics as nonparametric estimators of parameters g(F). In this 
section we consider some nonparametric tests of hypotheses. Often, the test statistic 
may be viewed as a function of a U statistic. 


13.3.1  Goodness-of-Fit Problem 


The problem of fit is to test the hypothesis that the sample comes from a specified 
DF Fo against the alternative that it is from some other DF F, where F(x) 4 Р(х) 
for some x € R. In Section 10.3 we studied the chi-square test of goodness of fit for 
testing Ho: Xj ~ Fo. Here we consider the Kolmogorov—Smirnov test of Ho. Since 
Но concerns the underlying DF of the X's, it is natural to compare the U-statistic 
estimator of g(F) = F(x) with the specified DF Fo under Ho. The U-statistic for 
g(F) = F(x) is the empirical DF F7 (x). 


Definition 1. Let Х|, X2, ... , X, be a sample from a DF F, and let F7 be a 
corresponding empirical DF. The statistic 


(1) D, = sup [Е (x) — F(x)| 


is called the (two-sided) Kolmogorov—Smirnov statistic. We write 
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(2) р} —sup[F? (x) — F(x)] 
and 
(3) D, = sup[F(x) — Е}(х)], 


and call D}, D; the one-sided Kolmogorov-Smirnov statistics. 


Theorem 1. The statistics D,, D, , Dj are distribution-free for any continuous 
DF F. 


Proof. Clearly, D, = max(Di, Dy). Let Ха) < XQ) Sees Xn) be the 
order statistics of X1, X2,... , Xn, and define Хоу) = —00, X(44.1) = +оо. Then 


i 
Ех) = — for Xi) € x < Хо), і = 0,1, 2,..., м, 
п 
and we have 


Dt = max вир Р = Fo) 


Osis" xq ex «Xq4)) Ut 
i : 
= max i—-— inf F(x) 
Oxixn|n Хау<х< Хе +1) 


i 
= max |> — Fato) 


Oxi zn 


= max 4 max [5 — e А 
1<і<п| n 

Since ЕҒ(Х (гу) is the ith-order statistic of a sample from U (0, 1) irrespective of what 

Е is, as long as it is continuous, we see that the distribution of D+ is independent of 

Е. Similarly, 


is i—-l 
D, = max | max [есе — =| J ; 
and the result follows. 


Without loss of generality, therefore, we assume that F is the DF of a U (0, 1) RV. 


Theorem 2. If F is continuous, then 
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1 

(4) Р\}р, vt — 
2n 


v+(1/2n) pv+(3/2n) 
js —v | HERMES -v 


= v4-[Qn-— 1)/2n] 1 2n — 1 
,и2,..., . du; if O 2 
Í Sf (uy, u2 иһ) П и <0 < n 


ifv < 0, 








(2л—1)/2п]—ъ 
2n—] 
1 ifv 2 T, 
2n 
where 
п!, О<ир <... <u, < 1, 
(5) fui u2,... , Un) = к 


0, otherwise, 


is the joint PDF of the set of order statistics for a sample of size n from U (0, 1). 


We will not prove this result here. Let D, „ be the upper w-percent point of the 
distribution of D,, that is, P{D, > Р, о} < a. The exact distribution of D, for se- 
lected values of n and о has been tabulated by Miller [72], Owen [77], and Birnbaum 
[8]. The large-sample distribution of D, was derived by Kolmogorov [51], and we 
state it without proof. 


Theorem 3. Let F be any continuous DF. Then for every z > 0, 


(6) tim, P(D, < zn |?) = L(z), 
where 

со ; 22 
(7) Lz) = 1-2370-1) Te 2 А 


і=1 


Theorem 3 can be used to find dy such that lim, оо P{./n Dn < da} = 1 — a. 
Tables of d, for various values of o are also available in Owen [77]. 

The statistics D and D, have the same distribution because of symmetry, and 
their common distribution ís given by the following theorem. 


Theorem 4. Let F be a continuous DF. Then 
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f 2 1)/n}—z zu =z 


«f. fu, uz... tn) Tas 0 « z «1, 
(1/n)-z 
1 ifz>1, 


ifz <0, 


(8) PID} <z= 


where f is given by (5). 


We leave the reader to prove Theorem 4. 
Tables for the critical values Dt e» Where P{D} > Dj 4) < a, are also available 
for selected values of n and о (see Birnbaum and Tingey [7]). Table ST7 gives Dt P 
and D, for some selected values of n and о. For large samples, Smirnov [106] 
showed that 
2 


(9) lim P(/n Dy <z}=1-e, z > 0. 
n 


In fact, in view of (9), the statistic V, = 4пр} 2 һаѕа limiting х20) distribution, for 
4п D}? < Az? if and only if /n Dt < z, z > 0, and the result follows since 


lim P(V, < 422) - 1-677, 220, 
n 
so that 
lim P(V, <x} = 1 merle x » 0, 
n 


which is the DF of a x?(2) RV. 


Example І. Let o = 0.01, and let us approximate D} „. We have xi 001 ^ 921. 
Thus V, — 9.21, yielding 


921 3.03 
4n — 2|/n 


If, for example, n — 9, then р} 0.01 = 3-03/6 = 0.50. Of course, the approximation 
is better for large n. 


+ 
D3001 = 


The statistic D, and its one-sided analogs can be used in testing Ho: X ~ Fo 
against Ну: X ~ Е, where Fo(x) # F(x) for some x. 


Definition 2. To test Ho: F(x) = Fo(x) for all x at level a, the Kolmogorov— 
Smirnov test rejects Ho if D, > D, 4. Similarly, it rejects F(x) > Fo(x) for all x if 
D, > Dj, and rejects F(x) < Fo(x) for all x at level о if О} > DI 
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For large samples we can approximate by using Theorem 3 or (9) to obtain an 
approximate a-level test. 


Example 2. Let us consider the data in Example 10.3.3, and apply the Kolmogorov— 
Smirnov test to determine the goodness of the fit. Rearranging the data in increasing 
order of magnitude, we have the following result: 


x Fo(x) Fix) 1/20 — Foxo) Fo(xq)) — (i — 1/20 
—1.787 0.0367 3 0.0133 0.0367 
—1.229 0.1093 5 —0.0093 0.0593 
—0.525 0.2998 * —0.1498 0.1998 
—0.513 0.3050 $ —0.1050 0.1550 
—0.508 0.3050 3 —0.0550 0.1050 
—0.486 0.3121 $ -0.0121 0.0621 
—0.482 0.3156 Б 0.0344 0.0156 
—0.323 0.3745 $ 0.0255 0.0245 
—0.261 0.3974 2 0.0526 —0.0026 
—0.068 0.4721 2 0.0279 0.0221 
—0.057 0.4761 H 0.0739 —0.0239 

0.137 0.5557 12 0.0443 0.0057 
0.464 0.6772 B —0.0272 0.0772 
0.595 0.7257 E —0.0257 0.0757 
0.881 0.8106 5 —0.0606 0.1106 
0.906 0.8186 1 —0.0186 0.0686 
1.046 0.8531 g —0.0031 0.0531 
1.237 0.8925 3 0.0075 0.0425 
1.678 0.9535 g —0.0035 0.0535 
2.455 0.9931 1 0.0069 0.0431 








From Theorem 1, 
D, = 0.1998, D; = 0.0739, and D2 = max(D3p, Dy) = 0.1998. 


Let us take о = 0.05. Then Гоо, о.05 = 0.294. Since 0.1998 < 0.294, we accept Но 
at the 0.05 level of significance. 


It is worthwhile to compare the chi-square test of goodness of fit and the 
Kolmogorov-Smirnov test. The latter treats individual observations directly, whereas 
the former discretizes the data and sometimes loses information through grouping. 
Moreover, the Kolmogorov-Smirnov test is applicable even in the case of very small 
samples, but the chi-square test is essentially for large samples. 


SOME SINGLE-SAMPLE PROBLEMS 613 


The chi-square test can be applied when the data are discrete or continuous, but 
the Kolmogorov-Smirnov test assumes continuity of the ОЕ. This means that the 
latter test provides a more refined analysis of the data. If the distribution is actually 
discontinuous, the Kolmogorov-Smirnov test is conservative in that it favors Ho. 

We next turn our attention to some other uses of the Kolmogorov-Smirnov statis- 
tic. Let X1, X2, ... , X, be a sample from a DF Е, and let F7 be the sample DF. The 
estimate F7 of F for large n should be close to F. Indeed, 


E НЕЙ „|5, 
Ti > 


(10) P [ize - ко) * XP 


and since F(x)[1 — F(x)] < $, we have 


А A 1 
(11) P (IRO - Fons zz e 


Thus Р; can be made close to F with high probability by choosing A and large 
enough n. The Kolmogorov-Smirnov statistic enables us to determine the smallest n 
such that the error in estimation never exceeds a fixed value є with a large probability 
1 — a. Since 


(12) P{D, < е} = 1-а, 


є = Г, о; and given £ and о, we can read п from the tables. For large n we can use 
the asymptotic distribution of D, and solve dy = &4/n for n. 

We can also form confidence bounds for F. Given о and n, we first find D, „ such 
that 


(13) Р{Р, > Dra} < а, 


which is the same as 


Р {sup Fg ~ ЕО) < Dra} -1-a. 
x 


Thus 

(14) P{\FX(x) — F(x)| < Dag forallx} >1—a. 
Define 

(15) Ln(x) = max(F; (x) — Dna, 0} 

and 

(16) Un(x) = min(F; (x) + Ds, 1}. 


Then the region between L, (x) and U, (x) can be used as a confidence band for F (x) 
with associated confidence coefficient 1 — о. 
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Example 3. For the data on the standard norma! distribution of Example 2, Jet us 
form a 0.90 confidence band for the DF. We have D20,0.10 = 0.265. The confidence 
band is, therefore, FQ) + 0.265 as long as the band is between 0 and 1. 


13.3.2 Problem of Location 


Let Xj, X2,... , X, be a sample of size n from some unknown DF F. Let p be a 
positive real number, 0 < р < 1, and let 35 (F) denote the quantile of order p for 
the DF F. In the following analysis we assume that F is absolutely continuous. The 
problem of location is to test Ho: 35(F) = 30, 30 a given number, against one of 
the alternatives 35(F) > 30, 3p < 30, and 3p Æ 3o. The problem of location and 
symmetry is to test Hy: 39.5(F) = 30, and F is symmetric against Hj: 30.5(F) Æ 30 
or F is not symmetric. 
We consider two tests of location. First, we describe the sign test. 


Sign Test 
Let X1, X2,..., X, be iid RVs with common PDF f. Consider the hypothesis- 
testing problem 


(17) Ho: àpCf) = 30 against Hi: 3p(f) > 30, 


where 5, ( f) is the quantile of order p of PDF f,0 < p < 1. Let g(F) = P(X; > 
30) = P(X; — 30 > 0). Then the corresponding U -statistic is given by 


nU(X) = R*(X), 


the number of positive elements іп X; — 30, X2 — 39. ... , Xn — 30. Clearly, Р(Х; = 
30) = 0. Fraser [29, рр. 167-170] has shown that a UMP test of Ho against Hj is 
given by 


1, R*(x)-c; 
(18) ф(х) = фу, К+ (х) = с, 
0, Rt (x) < с, 


where с and у are chosen from the size restriction 


(0) ee Y ( n Ja = р) pR р ("Ja рур". 


cu NO 


Note that under Ho, 35 (f) = 30, so that Рн,(Х < 30) = р, and R*(X) ~ b(n, 1 — 
р). The same test is UMP for Ho: 3p (f) < зо against Hy: 35 (f^) > 30. For the two- 
sided case, Fraser [29, p. 171] shows that the two-sided sign test is UMP unbiased. 
If, in particular, зо is the median of f, then p = 1 under Но. In this case опе сап 
also use the sign test to test Но: med(X) = 3o, F is symmetric. 
For large n one can use the normal approximation to binomial to find c and y in 
(19). 
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Example 4. Entering college freshmen have taken a particular high school 
achievement test for many years, and the upper quartile (p = 0.75) is well es- 
tablished at a score of 195. A particular high school sent 12 of its graduates to 
college, where they took the examination and obtained scores of 203, 168, 187, 
235, 197, 163, 214, 233, 179, 185, 197, 216. Let us test the null hypothesis Ho that 
30.75 < 195 against Hj: 30.75 > 195 at the a = 0.05 level. 

We have to find c and y such that 


EO0Q AO -5 


From the table of cumulative binomial distribution (Table ST1) for n — 12, p — 1, 
we see that c — 6. Then y is given by 


1715 ауе 
0.0142 4- « е) (3) (3) = 0.05. 


Thus 
0.0358 
= ——— = 0.89. 
Y = 0.0402 
In our case the number of positive signs, x; — 195, i = 1, 2,..., 12, is 7, so we 


reject Ho that the upper quartile is < 195. 


Example 5. A random sample of size.8 is taken from a normal population with 
mean 0 and variance 1. The sample values are —0.465, 0.120, —0.238, —0.869, 
— 1.016, 0.417, 0.056, 0.561. Let us test hypothesis Ho: и = —1.0 against H4: и > 
— 1.0. We should expect to reject Но since we know that it is false. The number of 
observations, x; — Шо = х; + 1.0, that are > 0 is 7. We have to find c and y such that 


x00) (9) оо 


b» () T X) = 12.8. 


We see that с = 6 and у = 0.13. Since the number of positive x; — uo is > 6, we 
reject Ho. 
Let us now apply the parametric test here. We have 


that is, 


1.434 
x= x m —0.179. 
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Since с = 1, we reject Но if 
: Zæ = —1.0 + Г 1.64 
уп“ MEET d 
= —0.42. 


X » uot 


Since —0.179 > —0.42, we reject Ho. 


The single-sample sign test described above can easily be modified to apply to 
sampling from a bivariate population. Let (X1, Y1), (X2, Y2), ... , (Xn, Yn) be a ran- 
dom sample from a bivariate population. Let Z; = X; — Yi, i = 1,2,... , n, and 
assume that Z; has an absolutely continuous DF. Then one can test hypotheses con- 
cerning the order parameters of Z by using the sign test. A hypothesis of interest 
here is that Z has a given median 39. Without loss of generality, let 39 = 0. Then 
Ho: med(Z) = 0; that is, P{Z > 0} = P{Z < 0} = 1. Note that med(Z) 1s not 
necessarily equal to med(X) — med(Y), so that Ho is not that med(X) = med(Y) 
but that med(Z) — 0. The sign test is UMP against one-sided alternatives and UMP 
unbiased against two-sided alternatives. 


Example 6. We consider an example due to Hahn and Nelson [37], in which two 
measuring devices take readings on each of 10 test units. Let X and Y , respectively, 
be the readings on a test unit by the first and second measuring devices. Let X — 
А + в, Y = А + £2, where A, сү, е2, respectively, are the contributions to the 
readings due to the test unit and to the first and second measuring devices. Let A, £1, 
£2 be independent with EA = p, var(A) = o2, Еє = Ees = 0, var(é1) = o2, 
уаг(ғ2) = 67; so that X and Y have common mean yz and variances o? + o2 and 
a2 + o2, respectively. Also, the covariance between X and Y is ae. The data are as 
follows: 


Test Unit 


1 2 3 4 5 6 7 8 9 10 


First device, X 71 108 72 140 61 97 90 127 101 114 
Second device, 77 105 71 152 88 117 93 130 112 105 
Z-X-Y —6 3 1 -8 -17 -20 -3 -3 -li 9 











Let us test the hypothesis Ну: med(Z) = 0. The number of Z;’s > 0 is 3. We 
have 


3 10 1 10 
P{number of Z;’s > 01 € 3| Ho} = У, (t) G) 
k=0 2 


= 0.172. 


Using the two-sided sign test, we cannot reject Ho at level о = 0.05, since 0.172 > 
0.025. The RVs Z; can be considered to be distributed normally, so that under Ho 
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the common mean of Z;’s is О. Using a paired comparison t-test on the data, we can 
show that г = —0.88 for 9 d.f., so we cannot reject the hypothesis of equality of 
means of X and У at level a = 0.05. 


Finally, we consider the Wilcoxon signed-ranks test. 


Wilcoxon Signed-Ranks Test 

The sign test for median and symmetry loses information since it ignores the mag- 
nitude of the difference between the observations and the hypothesized median. The 
Wilcoxon signed-ranks test provides an alternative test of location (and symmetry) 
that also takes into account the magnitudes of these differences. 

Let X1, X2, ... , X, be iid RVs with common absolutely continuous DF F, which 
is symmetric about the median 31/2. The problem is to test Ho: 31/2 = 30 against 
the usual one- or two-sided alternatives. Without loss of generality, we assume that 
30 = 0. Then F(—x) = 1 — F(x) for all x є R. To test Ho: F(0) = 5 or 31/2 = 0, 
we first arrange |X1|, |X2], ... , ІХ, | in increasing order of magnitude and assign 
ranks 1,2, ... ‚п, keeping track of the original signs of X;. For example, ifn = 4 
апа |X2| < |X4| < |X | < |Хз|, the rank of [Х| is 3, of | X2] is 1, of |X3| is 4, and 
of | X4| is 2. 

Let 


Q0) | T+ = sum of the ranks of positive X;'s, 


T~ = sum of the ranks of negative X;'s. 
Then, under Ho, we expect T+ and T^ to be the same. Note that 
eS п(п+1) 


21 T+T = 
Q1) + y» 2 


so that 7+ and T~ are linearly related and offer equivalent criteria. Let us define 


(22) 


jz1,2;.. n; 


1 if X; >0 
Zi = > , 
0 if X; «0 


and write R(|X;|) = R} for the rank of |X;|. Then T+ = Y , Rf Z; and T- = 
Yin — ZR}. Also, 


n n 
Q3) T'-T =-) RF «29 > ZRF 
i=l i=l 


n 
4 n(n +1) 
=2 25 R= 
i=l 
The statistic T+ (or TT) is known as the Wilcoxon statistic. A large value of T * (or, 
equivalently, a small value of T ^) means that most of the large deviations from 0 are 
positive, and therefore we reject Ho in favor of the alternative, Hj : 31/2 > 0. 
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A similar analysis applies to the other two alternatives. We record the results as 
follows: 


Ho Hi Reject Ho if: 
31/2 = 0 о> 0 Т* > с 
312 = 0 312 < 0 Tt «c 
3/2 = 0 31/2 #0 T* < Сз Or Т+> с 


We now show how the Wilcoxon signed-ranks test statistic is related to the U- 
statistic estimate of g2(F) = Pr (Xi + Хә > 0). Recall from Example 13.2.6 that 
the corresponding U-statistic is 


-1 
n 
Q4) U2(X) = (2) 3 Дх+ху>о: 
lxi«jzn 
First note that 
(25) Уу) Пх+х>о= ME Y Дх+х;>0). 
1<і<ј<п 1<і<ј<п 


Next note that for i < j, Xa) + Xg) > Oif and only if X(jj > Oand |X| < Xgl. 
It follows that $ 7. Д Xq)-X(5»0] 15 the signed rank of Хү у). Consequently, 


n Jj 
(26) T'- у, у) Lx (+X y>0) = У Tix; xj0] 
j=l i=] 1<ї<}<п 
= Y non + УХ дх+х,>ој 
1<і<ј<п 


= nU (X) + C уо, 


where U4 is the U-statistic for g1(F) = Pr(X, > 0). 

We next compute the distribution of T+ for small samples. The distribution of T * 
is tabulated by Kraft and Van Eeden [53, pp. 221—223]. 

Let 


ln = 1 if the |X j| that has rank i is > 0, 
a 0 otherwise. 


Note that T+ = 0 if all differences have negative signs, and 7+ = n(n + 1)/2 if 
all differences have positive signs. Here a difference means a difference between the 
observations and the postulated value of the median. T+ is completely determined by 
the indicators Zg), so that the sample space can be considered as a set of 2” n-tuples 
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(z1, Z2,+-+ , Zn), where each z; is 0 or 1. Under Но, 31/2 = зо and each arrangement 
is equally likely. Thus 


{number of ways to assign + or — signs to 


x integers 1,2, ... , n so that the sum is 1] 
Q7) Dieses c ELI 
n(t) 
= Эп’ ѕау. 


Note that every assignment has а conjugate assignment with plus and minus signs 
interchanged so that for this conjugate, T is given by 


n 


1 п 
(28) Yil- Zw) = ED - iz. 
1 1 


Thus under Ap the distribution of T * is symmetric about the mean n(n + 1)/4. 


Example 7. Let us compute the null distribution for n = 3. Eg, T^ = n(n + 
1)/4 = 3, and T+ takes values from 0 to n(n + 1)/2 = 6: 











Ranks Associated with 
Value of T * Positive Differences n(t) 
6 1,2,3 1 
5 2,3 1 
4 1,3 1 
3 1,2;3 2 
so that 
bo t=4,5,6,0,1,2, 
(29) Рн (ТҮ =th= 4%, 1 =3, 
0, otherwise. 


Similarly, for n = 4, one can show that 


t = 0, 1, 2,8, 9, 10, 
1= 3,4, 5, 6,7, 
otherwise. 


ә 


alval— 


~ 


(30) Py {Tt =t} = 


e 


An alternative procedure would be to use the MGF technique. Under Ho, the RVs 
i Za) are independent and have the PMF 


P{iZ@ = i) = P{iZ@ = 0} = 1. 
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Thus 


(31) M(t) = Ee'T* 
ng! +] 
ae 





i=] 


We express M(t) as a sum of terms of the form о je /2". The PMF of T* can then 
be determined by inspection. For example, in the case n = 4, we have 

















oa 90 сыны 
EG NEN ME 2 2 
1 3t 1 At 1 
(32) =” + te n ET 
1 "oT 
(33) = ge tet te" + 20% E e фе a n v 


1 
(34) = е ке teh + 20" + 268. 4 265 4 2e” 4 203! 4 е 4 of + 1). 
This method gives us the PMF of T* for n = 2, n = 3, and n = 4 immediately. 
Quite simply, 


(35) _ Pu,(T* = j} = coefficient of e/' in the expansion of M (t), j = 0, 
1,...,n(n + 1)/2. 


See Problem 3.3.12 for the PGF of T+. 


Example 8. Let us return to the data of Example 5 and test Ho: 31/2 = и = —1.0 
against Н\: 31/2 > —1.0. Ranking |x; — 31/2| in increasing order of magnitude, we 
have 


0.016 < 0.131 < 0.535 < 0.762 < 1.056 < 1.120 < 1.417 < 1.561 


5 4 1 3 7 2 6 8 
Thus 
ri = 3, r2 = 6, r3 = 4, r4 = 2, 
15 = 1, r6 — 7, rj; = 5, rg = 8 
and 


Tt =346444247454+8 =35. 


From Table ST10, Ho is rejected at level a = 0.05 if Tt > 31. Since 35 > 31, we 
reject Но. 
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Remark 1. Тһе Wilcoxon test statistic can also be used to test for symmetry. Let 
Xj, X2, ... , Xn be iid observations on an RV with absolutely continuous DF F. We 
set the null hypothesis as 


Ho: 31/2 = 30, and DF F is symmetric about зо. 
The alternative is 
Hi: 31/2 Æ зо and F symmetric, ог F asymmetric. 
The test is the same since the null distribution of T * is the same. 


Remark 2. If we have n independent pairs of observations (X1, Y1), (X2, Y2), 
,..» (Xn, Yn) from a bivariate DF, we form the differences Z; = X; — Y;,i = 
1,2,...,m. Assuming that 21, 22,..., Zn are (independent) observations from a 
population of differences with absolutely continuous DF F that is symmetric with 
median 31/2, we can use the Wilcoxon statistic to test Ho: 31/2 = 30- 


We present some examples. 
Example 9. For the data of Example 10.3.3, let us apply the Wilcoxon statistic to 
test Ho: 31/2 = О and F is symmetric against H1: 31/2 Æ 0 and F symmetric or F 


not symmetric. 


The absolute values, when arranged in increasing order of magnitude, are as fol- 
lows: 


0.057 < 0.068 < 0.137 < 0.261 < 0.323 < 0.464 < 0.482 < 0.486 


13 5 2 17 4 1 11 15 
< 0.508 < 0.513 < 0.525 < 0.595 < 0.881 < 0.906 < 1.046 
20 7 8 9 10 6 19 
< 1.229 < 1.237 < 1.678 < 1.787 < 2.455 
14 18 12 16 3 
Thus 
гү = 6, ғ = 3, r3 = 20, р = 5, 15 = 2, rg — 14, 
т = 10, rg—ll, ro=12, rjg—183, ri —7, rj? = 18, 
різ = 1, па = 16, ris = 8, rig = 19, ту = 4, rig = 17, 
rio = 15, r2 = 9, 
апа 


Tt .=6+3+ 20+ 14 + 12+ 13 +18 + 17 + 15 = 118. 


From Table ST10 we see that Но cannot be rejected even at level о = 0.20. 
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Example 10. Returning to the data of Example 6, we apply the Wilcoxon test to 
the differences Z; = X; — Y;. The differences are —6, 3, 1, —8, —17, —20, —3, —3, 
—11, 9. To test Ho: 31/2 = 0 against Hi: 31/2 # 0, we rank the absolute values of 
zi in increasing order to get 


1<3=3=3<6<8<9<11< 17 < 20 
апа 
Тъ= 1+2 +7 = 10. 


Here we have assigned ranks 2, 3, 4 to observations +3, —3, —3. (If we assign rank 
4 to observation 3, then T+ = 12 without appreciably changing the result.) 

From Table ST10 we reject Ho at a = 0.05 if either Т? > 46 or T^ < 9. Since 
T* > 9 and < 46, we accept Ho. Note that hypothesis Но was also accepted by the 
sign test. 


For large samples we use the normal approximation. In fact, from (26) we see that 


О ET m n OEUU 4 Wal BUS) 


n n 
2 2 
Clearly Uj — EU, —» 0 and since n3/2 / А — 0, the first term — 0 in probability 


as n — oo. By Slutsky's theorem (Theorem 6.2.15) it follows that 
Jn 

n 

2 
have the same limiting distribution. From Theorem 13.2.3 and Example 13.2.7 it 


follows that ./n(U2—EU?2), and hence (T+-ET*) yn / B has a limiting normal 


distribution with mean 0 and variance 


(T+ - ЕТТ) апа J/n(U;-— EU) 


4t = 4Pr(X1 + X2 > 0, Xi + Хз > 0) - APZQG + X2 > 0). 


Under Но, the RVs i Za) are independent b(1, 5) so 


+_ nn d) "NV ehw es Aa п(п + 00л +1) 
Ent Seg and vary, T" = 5)\»> 2 =e 


Also, under Ho, F is continuous and symmetric, so 


Pr(X1 + X2 > 0) = | РЕ(Хү > —x) f(x) dx = 1, 


—OO 
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and 


оо 
Pr(X1 + X2 > 0, X1 + Хз > 0) =f [Pr (X1 > —0P f(x)dx = 1, 
—oo 


Thus 4£1 = 1 — 1 = 1, so that 


T+ — ExTt 
ONE: 

9 3n 
(varm TH)? _ [n(n + DOn + 0247. 


n = п(п—1) [T 
i) 3n 2 3n 


as n — oo. Consequently, under Ho, 


п(п +1) n(n4- DQn + 2) 
4 И 24 ` 


—L, NO, р). 


However, 





r* ~ aN( 


Thus, for large enough n we can determine the critical values for a test based on T * 
by using normal approximation. 

As an example, take n = 20. From Table ST10 the P-value associated with tt = 
140 is 0.10. Using normal approximation yields 


140 — 105 


Т> 140) = P(Z 
par ( > 72145 


) = P(Z > 1.28) = 0.10003 


PROBLEMS 13.3 


1. Prove Theorem 4. 


2. A random sample of size 16 from a continuous DF on [0, 1] yields the following 
. data: 0.59, 0.72, 0.47, 0.43, 0.31, 0.56, 0.22, 0.90, 0.96, 0.78, 0.66, 0.18, 0.73, 
0.43, 0.58, 0.11. Test the hypothesis that the sample comes from U[0, 1]. 


3. Test the goodness of fit of normality for the data of Problem 10.3.6 using the 
Kolmogorov-Smirnov test. 


4. For the data of Problem 10.3.6, find a 0.95 level confidence band for the distri- 
bution function. 


S. The following data represent a sample of size 20 from UT[O, 1]: 0.277, 0.435, 
0.130, 0.143, 0.853, 0.889, 0.294, 0.697, 0.940, 0.648, 0.324, 0.482, 0.540, 
0.152, 0.477, 0.667, 0.741, 0.882, 0.885, 0.740. Construct a 0.90 level confi- 
dence band for F(x). 
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6. In Problem 5, test the hypothesis that the distribution is U[0, 1]. Take a = 0.05. 


7. For the data of Example 2, test, by means of the sign test, the null hypothesis 
Ho: и = 1.5 against Н: џи Æ 1.5. 


8. For the data of Problem 5, test the hypothesis that the quantile of order p == 0.20 
is 0.20. 


9, For the data of Problem 10.4.8, use the sign test to test the hypothesis of no 
difference between the two averages. 


10. Use the sign test for the data of Problem 10.4.9 to test the hypothesis of no 
difference in grade-point averages. 


11. For the data of Problem 5, apply the signed-rank test to test Ho: 31/2 = 9.5 
against H; : 31/2 # 0.5. 


12. For the data of Problems 10.4.8 and 10.4.9, apply the signed-rank test to the 
differences to test Ho: 31/2 = 0 against Hi: 31/2 Æ 0. 


134 SOME TWO-SAMPLE PROBLEMS 


In this section we consider some two-sample tests. Let Xj, X2,... , Xm and 
Yi, Y?, ... , Y, be independent samples from two absolutely continuous distribu- 
tion functions Fx and Fy, respectively. The problem is to test the null hypothesis 
Ho: Fx(x) = Fy (x) for all x є R against the usual one- and two-sided alternatives. 
Tests of Ho depend on the type of alternative specified. We state some of the 
alternatives of interest even though we do not consider all of these in this book. 


I Location alternative: Fy(x) = Fx(x —0), 0 #0. 
П Scale alternative: Fy(x) = Fx(x/o), o > 0. 
III. Lehmann alternative: Fy (x) = 1 — [1 — Fx(x)l£*!, @41>0. 
IV Stochastic alternative: Fy(x) > Fy (x) for all x, and Fy(x) > Fy(x) for at 
least one x. 
V General alternative: Fy (x) # Fy (x) for some x. 


Some comments are in order. Clearly, I through IV are special cases of V. Alter- 
natives 1 and II show differences in Fx and Fy in location and scale, respectively. 
Alternative III states that P(Y > x) = [Р(Х > x)|°t!. In the special case when Ө is 
an integer, it states that Y has the same distribution as the smallest of the 0 + 1 of X- 
variables. A similar alternative to test that is sometimes used is Fy(x) = [Fx (x)]* 
for some a > 0 and all x. When a is an integer, this states that Y is distributed as the 
largest of the a X-variables. Alternative IV refers to the relative magnitudes of X's 
and Y’s. It states that 


P(Y € x) > Р(Х <x) for all x, 
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so that 
(1) P(Y > х) < P(X > х), 
for all x. In other words, X's tend to be larger than the Y’s. 


Definition 1. We say that a continuous RV X is stochastically larger than a con- 
tinuous RV Y if inequality (1) is satisfied for all x with strict inequality for some x. 


A similar interpretation may be given to the one-sided alternative Fy > Fy. Іп the 
special case where both X and Y are normal RVs with means 444, 42 and common 
variance o°, Fx = Fy corresponds to ш = рә and Fy > Fy corresponds to 
Ш < ua. 

In this section we consider some common two-sample tests for location (case 
T) and stochastic ordering (case IV) alternatives. First, note that a test of stochastic 
ordering may also be used as a test of less restrictive location alternatives since, 
for example, Fx > Fy corresponds to larger Y's and hence larger location for Y. 
Second, we note that the chi-square test of homogeneity described in Section 10.3 
can be used to test general alternatives (case V) Ну: F(x) # G(x) for some x. 
Briefly, one partitions the real line into Borel sets A1, A2, ... , Ар. Let 


pii Р(Ху € А;) and ро = P(Yj є Ai), 


і = 1,2,...,k. Under Но: F = С, pi = pi2,i = 1,2,... ‚К, which is the 
problem of testing equality of two independent multinomial distributions discussed 
in Section 10.3. 

We first consider a simple test of location. This test, based on the sample median 
of the combined sample, is a test of the equality of medians of the two DFs. It will 
tend to accept Ho: F = G even if the shapes of F and С are different as long as 
their medians are equal. 


13.4.4 Median Test 


The combined sample X1, X2, ... , Xm, Y1, Y2, ... , Y, is ordered and a sample me- 
dian is found. If m +n is odd, the median is the [(m +n + 1)/2]th value in the ordered 
arrangement. Jf m + n is even, the median is any number between the two middle 
values. Let V be the number of observed values of X that are < the sample median 
for the combined sample. If V is large, it is reasonable to conclude that the actual 
median of X is smaller than the median of Y. One therefore rejects Ho: F = С 
in favor of Hı: F(x) > G(x) for all x and F(x) > G(x) for some x if V is too 
large, that is, if V > c. If, however, the alternative is F(x) < G(x) for all x and 
F(x) < G(x) for some x, the median test rejects Но if V < c. For the two-sided 
alternative that F(x) 3 G(x) for some x, we use the two-sided test. 

We next compute the null distribution of the RV V. If m + n = 2p, p a positive 
integer, then 
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(2) Py {V = v] = Py, {exactly v of the X;'s аге < combined median} 
m n 
ey) 
= m+n\ ` 
ce 


0, otherwise. 


v=0,1,2,...,m, 


Here 0 < V < min(m, p). If m+n = 2p +1, p > 0, is an integer, the [(m +n + 
1)/2]th value is the median in the combined sample, and 


(3) Pm {V = v} = P {exactly v of the X;'s are below the (p + 1)th value 
in the ordered arrangement] 


К, 


0, otherwise. 


v —0,1,... , min(m, p), 


Remark 1. Under Ho we expect (m + n)/2 observations above the median and 
(m + n)/2 below the median. One can therefore apply the chi-square test with 1 d.f. 
to test Ho against the two-sided alternative. 


Example 1. The following data represent lifetimes (hours) of batteries for two 
different brands: 


Brand А: 40 30 40 45 55 30 
Brand B: 50 50 45 55 60 40 


The combined ordered sample is 30, 30, 40, 40, 40, 45, 45, 50, 50, 55, 55, 60. 
Since m + n = 12 is even, the median is 45. Thus 


v — number of observed values of X that are less than or equal to 45 
= 5, 
Now 
O0 6900) 
S/\l 6/ X0 
Pg(V > 5) = UN + 7. £ 0.04. 
6 6 

Since Py,{V > 5} > 0.025, we cannot reject Ho, that the two samples come from 
the same population. 
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We now consider two tests of the stochastic alternatives. As mentioned earlier, 
they may also be used as tests of location. 


13.4.2 Kolmogorov-Smirnov Test 


Let Xj, X2, ... , Xm and Yj, Y2, ... , Y, be independent random samples from con- 
tinuous DFs F and С, respectively. Let Fa and G}, respectively, be the empirical 
DFs of the X’s and Y's. Recall that F% is the U-statistic for F, and Су, that for G. 
Under Ho: F(x) — G(x) for all x, we expect a reasonable agreement between the 
two sample DFs. We define 


(4) Dm,n = sup | Fs) — GZO). 


Then D,,, may be used to test Ho against the two-sided alternative H1: F(x) Æ 
G(x) for some x. The test rejects Ho at level o if 


(5) Dm,n Z Dine 


where Рн,{Оъ п > Dm,n,a} < a. 
Similarly, one can define the one-sided statistics 


(6) Dj, = зир[Е*(х) — Сң(х)] 
and 
(7) Ру „ = зир[С}(х) — FA(x)], 


to be used against the one-sided alternatives 


(8) G(x) x F(x) forallx and G(x) < F(x) forsome x 
with rejection region D$ „ > Di ne 

and 

(9) F(x) < G(x) forallx and F(x) < G(x) forsome x 
with rejection region D, n > Dino 

respectively. 


For small samples, tables due to Massey [70] are available. In Table ST9 we give 
the values of Dm л and D$ , „ for some selected values of m, n, and о. Table ST8 
gives the corresponding values for the m = n case. 

For large samples we use the limiting result due to Smirnov [105]. Let N = 


mn/(m +n). Then 
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1— eM. А > 0, 


" + "PN 
Q0 — tim PIVN Dan <А) = l. A < 0, 


апа 


24 ; ;212 
у) c»evV*, 1-0 
<А = 


= j=- 


0, А < 0. 


(11) іт P{VN Dm,n 
m,n-»oo 


Relations (10) and (11) give the distribution of D „n and D, 4, respectively, under 
Ho: F(x) = G(x) for all x € R. 


Example 2. Let us apply the test to data from Example 10. Do the two brands 
differ with respect to average life? 

Let us first apply the Kolmogorov-Smirnov test to test Ho that the population 
distribution of length of life for the two brands is the same. 











x FE (x) Gz60 | F(x) — Gœ) 

2 2 
ME NE. 
m 6 6 б 
бол | 
50 6 6 6 
55 1 > " 
60 1 1 0 

Dess = sup |Fg (x) — Gé(x)| = >. 


From Table ST8 the critical value for m = n = 6 at level a = 0.05 is D6,6,0.05 = 
4. Since D6, > De,6,0.05, we accept Но that the population distribution for the 
length of life for the two brands is the same. 

Let us next apply the two-sample t-test. We have x = 40, y = 50, s? = 90, 
52 = 50, 52 = 70. Thus 


t= 2140—50 - = —2.08. 


Уб +1 


Since t10,0.025 = 2.2281, we accept the hypothesis that the two samples come from 
the same (normal) population. 
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The second test of stochastic ordering alternatives we consider is the Mann— 
Whitney-Wilcoxon test, which can be viewed as a test based on a U-statistic. 


13.4.5 Mann-Whitney-Wilcoxon Test 


Let X1, X2, ... , Xm and ү, Y2, ... , Y, be independent samples from two continu- 
ous DFs, F and G, respectively. As in Example 13.2.9, let 


1 dfXi«Y; 


TERNES Iy ap SY. 


fori = 1,2,...,m, j = 1,2,... ‚п. Recall that T(X;; Yj) is an unbiased esti- 
mator of g(F, С) = Pr.g(X < Y) and the two-sample U-statistic for g is given 
by Ui(X; Y) = (т, п) 1 YT. 2 T (Xj; Y;). For notational convenience, let us 
write 


m n 
(12) U = mnU (X; Y) = 3 ў TCX: Y). 

i=l j=l 
Then U is the number of values of X1, X2, ... , Xm that are smaller than each of 
Yi, Yo,... , Yn. The statistic U is called the Mann-Whitney statistic. An alternative 


equivalent form using Wilcoxon scores is the linear rank statistic given by 
n 

a3) У = У`0;, 
з 


where Q; = rank of Y; among the combined m + n observations. Indeed, 
О; = rank of Y; = (no. of X;’s < Y;)+ rank of Y; in Y's. 
Thus 


n(n + 1) 


(14) у= Q0j;=U+}_ j=U+ 5 


п 
j=l j=! 


so that U and W are equivalent test statistics—hence the name Mann-Whitney- 
Wilcoxon test. We restrict our attention to U as the test statistic. 


Example 3. Let т = 4, п = 3, and suppose that the combined sample when 
ordered is as follows: 


X2 < X1 < ya < у € X4 < ур € X3. 


Then U = 7, since there are three values of x < yj, two values of x < y2, and two 
values of x < уз. Also, W = 13, so U = 13— 3(4)/2 = 7. 


Note that U = 0 if all the X;'s are larger than all the Y;'s and U = mn if all the 
Xi's are smaller than all the Y ;'s, because then there are m X's < Yi, m X's < Y2, 
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and so on. Thus 0 < U < mn. If U is large, the values of Y tend to be larger than 
the values of X (Y is stochastically larger than X), and this supports the alternative 
F(x) > G(x) for all x and F(x) > G(x) for some x. Similarly, if U is small, 
the Y values tend to be smaller than the X values, and this supports the alternative 
F(x) x G(x) for all x and F(x) « G(x) for some x. We summarize these results as 
follows: 


Hs Hi Reject Но if: 
F=G F>G U >c 
F=G F<G Usa 
F=G Е +С О > сог < сд 





To compute the critical values we need the null distribution of U. Let 
(15) Pm nlu) = Pg, (U = и). 


We will set up a difference equation relating р», п to Pm—i,n and Pm »—1. If the 
observations are arranged in increasing order of magnitude, the largest value can be 
either an x value or a y value. Under Ho, all т + n values are equally likely, so the 
probability that the largest value will be an x value is m/(m + n) and that it will be 
a y value is n/(m + n). 

Now, if the largest value is an x, it does not contribute to U, and the remaining 
m — 1 values of x and n values of y can be arranged to give the observed value 
U = и with probability pj 1,4 (u). If the largest value is a Y, this value is larger 
than all the m x's. Thus, to get U = и, the remaining n — 1 values of Y and m values 
of x contribute U = и — m. It follows that 





m n 
(16) Pmn(t) = mE = Рт-1.т(и) + m papi — m). 
If m = 0, then for n > 1, 
(17) (cs 1 ifu = 0, 
Рон No: энш, 
If n = 0, m > 1, then 
(18) T 1 ifu = 0, 
EET ен у 
ааа 
(19) Pm nlu) = 0 ifu<0, т> 0, п> 0. 


For small values of т and л, one can easily compute the null РМЕ of U. Thus, if 
m =n = 1, then 
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pis) = 1 and pii() =}. 

If m = 1,n = 2, then 
р1.20) = p12(1) = p22) = 1. 


Tables for critical values are available for small values of т and n, m < n (see, 
e.g., Auble [2] or Mann and Whitney [69]). Table ST11 gives the values of uo for 
which Pm {U > ua} < о for selected values of т, n, and a. 

If m, n are large, we can use the asymptotic normality of U. In Example 13.2.10 
we showed that under Но, 


U/(mn) - 5 
A/ (n +n + 1)/12mn 


as m,n — oo such that m/(m + п) — constant. The approximation is fairly good 
for m,n > 8. 


+, NO, 1) 


Example 4. Two samples are as follows: 


Values of X;: — 1,2,3,5,7,9, 11, 18 
Values of ¥;: 4,6,8, 10, 12, 13, 14, 15, 19 
Thus m = 8,n = 9, and U = 3+4+4+5+4+64+74+74+7+4+7+8 = 54. The (exact) 


P-value is Рн„( > 54) = 0.046, so we reject Ho at (two-tailed) level a = 0.1. Let 
us apply the normal approximation. We have 





8.9 8.9 
Ен, == 7 = 36, varm (U) = mz ®@+*?+ 1) = 108, 
апа 
54 — 36 18 
Z= = ——— = V3 = 1.732. 
J/108 6/3 


We note that P(Z > 1.73) = 0.042. 


PROBLEMS 13.4 


1. For the data of Example 4, apply the median test. 


2. Twelve 4-year-old boys and twelve 4-year-old girls were observed during two 15- 
minute play sessions, and each child's play during these two periods was scored 
as follows for incidence and degree of aggression: 
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Boys: 86, 69, 72, 65, 113, 65, 118, 45, 141, 104, 41, 50 
Girls: 55, 40, 22, 58, 16, 7, 9, 16, 26, 36, 20, 15 


Test the hypothesis that there were gender differences in the amount of aggres- 
sion shown, using (a) the median test, and (b) the Mann—Whitney—Wilcoxon test. 
(Siegel [103]) 


3. To compare the variability of two brands of tires, the following mileages (1000 
miles) were obtained for eight tires of each kind: 


Brand A: 32.1, 2.6, 17.8, 28.4, 19.6, 21.4, 19.9, 3.1 
Brand B: 19.8, 27.6, 3.8, 27.6, 34.1, 18.7, 16.9, 17.9 


Test the nul! hypothesis that the two samples come from the same population, 
using the Mann-Whitney-Wilcoxon test. 


4. Use the data of Problem 2 to apply the Kolmogorov-Smirnov test. 
5. Apply the Kolmogorov-Smirnov test to the data of Problem 3. 


6. Yet another test for testing Ho: F = G against general alternatives is the runs 
test. A run is a succession of one or more identical symbols which are preceded 
and followed by a different symbol (or no symbol). The /ength of a run is the 
number of like symbols in a run. The total number of runs, К, in the combined 
sample of X's and Y's when arranged in increasing order can be used as a test of 
Ho. Under Ho the X and Y symbols are expected to be well mixed. A small value 
of А supports Hı: F Æ G. A test based on R is appropriate only for two-sided 
(general) alternatives. Tables of critical values are available. For large samples, 
one uses normal approximation: 





R~AN(1+ 2mn | 2mn(2mn — m — n) ) 


m+n’ (m+n — 1)0т +n} 


(a) Let Кү = number of X-runs, R2 = number of Y-runs, and А = Rj + Ro. 
Under Но, show that 


т—1 ae 
Р(К = ғ, R Е ИАА БЕ 
1 = ү], 02 =n) = m+n , 
т 
where К = 2 if ri = ro, = lif | —7r2| = 1, r2 = 1,2,..., т and r2 = 
1,2,...,n. 


(b) Show that 


-i 1 
Pu fi =r) = (7 JC ed 0czr < т. 
гр 1 ri m 
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7. Fifteen 3-year-old boys and fifteen 3-year-old girls were observed during two 
sessions of recess in a nursery school. Each child’s play was scored for incidence 
and degree of aggression as follows: 


Boys: 96, 65, 74, 78, 82, 121, 68, 79, 111, 48, 53, 92, 81, 31, 40 
Girls: 12, 47, 32, 59, 83, 14, 32, 15, 17, 82, 21, 34, 9, 15, 51 


Is there evidence to suggest that there are gender differences in the incidence and 
amount of aggression? Use both Mann-Whitney-Wilcoxon and runs tests. 


13.5 TESTS OF INDEPENDENCE 


Let X and Y be two RVs with joint DF F(x, y), and let F1 and Fo, respectively, be 
the marginal DFs of X and Y. In this section we study some tests of the hypothesis 
of independence, namely, 


Ho: F(x, y) = Ех) (у) forall(x, y) € R2 
against the alternative 
Hy: F(x, у) Æ Ех) (у) for some (x, у). 


If the joint distribution function F is bivariate normal, we know that X and Y are 
independent if and only if the correlation coefficient о = 0. In this case, the test of 
independence is to test Ho: р = 0. 

In the nonparametric situation the most commonly used test of independence is 
the chi-square test, which we now study. 


13.5.4 Chi-Square Test of Independence (Contingency Tables) 


Let X and Y be two RVs, and suppose that we have n observations on (X, Y). Let 
us divide the space of values assumed by X (the real line) into r mutually exclusive 
intervals A1, A2,... , A+. Similarly, the space of values of Y is divided into c disjoint 
intervals B1, B2, ... , Bc. As a rule of thumb, we choose the length of each interval 
in such a way that the probability that X(Y) lies in an interval is approximately 
(1/r)(1/c). Moreover, it is desirable to have n/r and n/c at least equal to 5. Let Xj; 
denote the number of pairs (Xz, Ук), k = 1, 2,... , n, that lie in A; x Bj, and let 


(1) pij = Р(Х, Y) € А; x Bj} = P(X є A; and Y e Bj], 
where і = 1,2,...,r, j =1,2,...,c. If each pij is known, the quantity 
: d 


Q) XX E (Xij – пру)? пр)? 


= npij 
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has approximately a chi-square distribution with rc — 1 d.f., provided that n is large 
(see Theorem 10.3.2). If X and Y are independent, P{(X, Y) € А; x Bj} = P{X є 
Ai) P(Y € Bj). Let us write pj. = P(X € Ai} and p.j = P{Y є Bj). Then under 
Ho: pij = pipj і = 1,2,...,r, j = 1,2,... ‚с. In practice, pij will not be 
known. We replace pi; by their estimates. Under Ho, we estimate pj. by 





С 
(3) pi = 2100 i=1,2,...,7, 
n 
and р. ; by 
r 
. X5 | 
(4) p= У a jxl,2,...,¢. 


i-i 


Since 55 Ё.у = 1 = У Di. we have estimated only r — 1 +c — 1 =r +c —2 
parameters. It follows (see Theorem 1.3.4) that the RV 


(5) Ü= Sos шар р. (Хә - npi Dj) прі. pj». 


i-l j=1 npi-b.j 


is asymptotically distributed as x? with re — 1 — (r +c —2) = (r - (c – 1) 
d. is under Но. The null hypothesis is rejected if the computed value of U exceeds 
Хоса" 

It is frequently convenient to list the observed and expected frequencies of ће rc 
events A; x Bj inanr x c table, called a contingency table, as follows: 














Observed Frequency Oj; Expected Frequency Ei; 
В, Bı- В, В! B+- В, 
А, Xu Xm Xi Ух; прі.р1  npypacccnpipe прі. 
Аз Xn Xm Xx È Xaj np.pi  npzpaccnpipe "pz 
A, Xn Xiao: Xre XX; np.pi пр,.рэ:::пр.рс пр. 
Y Xa Ухо УХ п np. npa np.c n 





Note that the X;;'s in the table are frequencies. Once the category А; x Bj is 
determined for an observation (X, Y), numerical values of X and Y are irrelevant. 
Next, we need to compute the expected frequency table. This is done quite simply by 
multiplying the row and column totals for each pair (i, j) and dividing the product 
by n. Then we compute the quantity 


(Ej — 0:3)? 
»» Е 
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and compare it with the tabulated x? value. In this form the test can be applied even 
to qualitative data. A1, A2, ... , А, and Bj, B2, ... , В, represent the two attributes, 
and the null hypothesis to be tested is that the attributes A and B are independent. 


Example I. Following are the results for a random sample of 400 employees: 





Annual Income (dollars) 


Time (years) with Less Than More Than 

the Same Company 40,000 40,000-75,000 75,000 Total 

<5 50 75 25 150 

5-10 25 50 25 100 

10 or more 25 75 50 150 
Total 100 200 100 400 





If X denotes the length of service with the same company, and Y, the annual 
income, we wish to test the hypothesis that X and Y are independent. The expected 
frequencies are as follows: 


Expected Frequency for Income of: 


Time (years) with 




















the Same Company < 40,000 40,000—75,000 z 75,000 Total 
«5 37.5 75 37.5 150 
5—10 25 50 25 100 
> 10 37.5 75 37.5 150 
Total 100 200 100 400 
Thus 
(125? 0 (12.5)? (12.5)? (12.5)2 
= = оар 0 
37.5 25 agg t" TU Tr 31.5 
== 16.66. 


The number of degrees of freedom is (3 — 1)(3 — 1) — 4, and X4.0.05 — 9.488. Since 
16.66 > 9.488, we reject Ho at level 0.05 and conclude that length of service with a 
company is not independent of annual income. 


13.5.2 Kendall's Tau 
Let (X1, Y1), (X2, Y2), ... , (Xn, Yn) be a sample from a bivariate population. 


Definition 1. For any two pairs (X;, Y;) and (Ху, Yj) we say that the relation is 
perfect concordance (or agreement) if 


(6) Xi; < Xj whenever Y; < Y; or X; > Xj whenever Y; > Y; 
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and that the relation is perfect discordance (disagreement) if 
(7) X; > Xj whenever Y; < У; or Xj < Xj whenever Y; > Yj. 


Writing л, and ла for the probability of perfect concordance and of perfect dis- 
cordance, respectively, we have 


(8) ле = P((Xj — XiY(Y; — Yi) > 0] 
and 
(9) ла = P{(X; — XiY(Y; — Yi) < 0}, 


and if the marginal distributions of X and Y are continuous, 
(10) лс = [PY < Y;} — P{X; > Xj; and Y; < Yj)] 
t [PlY; > Yj} - P(Xi < Xj and Y; > Yj]] = 1 — ла. 
Definition 2. The measure of association between the RVs X and Y defined by 
(11) T = Te — Ug 
is known as Kendall’s tau. 


If the marginal distributions of X and Y are continuous, we may rewrite (11), in 
view of (10), as follows: 


(12) т = l — 2ла = 2л, — 1. 
In particular, if X and Y are independent and continuous RVs, then 
P(X; < Xj} = P(Xi > Xj} = }, 
since then X; — X; is a symmetric RV. Then 
л, = Р(Х; < XjJP(Y; < Yj} - P(Xi > XjJPlY; > Yj} 
= P{X; > XjJP(Y; < Yj} + P(X; < XjJPlY; > Yj = ла, 


and it follows that т = 0 for independent continuous RVs. 

Note that, in general, т — 0 does not imply independence. However, for the bi- 
variate normal distribution, т = 0 if and only if the correlation coefficient p between 
X and Y is 0, so that t = O if and only if X and Y are independent (Problem 6). 

Let 


1, Qo — yD Ga — xi) > 0, 
0, otherwise. 


(13) V (Gi, y1), (х2, y2)) = | 
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Then Ew ((X1, Y1), (X2, Y2)) = te = (1 + т)/2, and we see that т, is estimable of 
degree 2, with symmetric kernel y defined in (13). The corresponding one-sample 
U-statistic is given by 


-1 
(14) U (1, 99... Xn 1) = (5) У) W(X Yi), (Xj. vp). 


1<і<ј<п 
Then the corresponding estimator of Kendall’s tau is 
(15) T=2U -1 


and is called Kendall’s sample correlation coefficient. 

Note that —1 < T < 1. To test Ho that X and Y are independent against H1: X 
and Y are dependent, we reject Ho if |T| is large. Under Ho, т = O, so that the null 
distribution of T is symmetric about 0. Thus we reject Ho at level o if the observed 
value of Т, t, satisfies |t| > 15/2, where P(|T| > ta/2 | Ho} = a. 

For small values of и the null distribution can be evaluated directly. Values for 
4 < n < 10 are tabulated by Kendall [49]. Table ST12 gives the values of Sy for 


which P(S > Sa} < о, where $ = (7 for selected values of n and a. 


For a direct evaluation of the null distribution we note that the numerical value 
of T is clearly invariant under all order-preserving transformations. It is therefore 
convenient to order X and Y values and assign them ranks. If we write the pairs from 
the smallest to the largest according to, say, X values, the number of pairs of values 
of 1 <i < j < п for which Y; — Y; > 0 is the number of concordant pairs, P. 


Example 2. Let n = 4, and let us find the null distribution of T. There are 4! 
different permutations of ranks of Y: 


Ranks of X values: 1, 2, 3, 4 
Ranks of Y values: aj, а, аз, ад 


where (a1, 22, аз, ад) is one of ће 24 permutations of 1, 2, 3, 4. Since the distribu- 
tion is symmetric about 0, we need only compute one-half of the distribution. 





P T Number of Permutations Py (T = tj 

1 
0 —1.00 1 57 
24 

3 
1 —0.67 I 
? 24 

5 
2 —0.33 5 5 
24 

6 
3 0.00 6 P? 
24 


Similarly, for п = 3, the distribution of T under Но is as follows: 
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P T Number of Permutations Py (T = tj 
0 —100 1: (3,2,1) 1 

6 
1 —0.33 2: (2,3, 1), (3,1,2) Я 





Example 3. Two judges rank four essays as follows: 





Essay 
Judge 1 2 3 4 
LX 3 4 2 1] 
2,Y 3 1 4 2 


To test Hy: rankings of the two judges are independent, let us arrange the rankings 
of the first judge from 1 to 4. Then we have: 


Judgel,X: 1, 2, 3, 4 
Judge2, Y: 2, 4, 3, 1 


P = number of pairs of rankings for judge 2 such that for j > i, Y; — Y; > 0 = 2 
[the pairs (2, 4) and (2, 3)], and 


(2502 1 = 0.33. 


G 


18 
Pm {ITI > 0.33} = 7 = 0.75, 


Since 


we cannot reject Ho. 


For large n we can use an extension of Theorem 13.3.3 to bivariate case to con- 
clude that /n(U — te) —» N (0, 4¢1), where 
& = cov [y ((X1, Y), (X2, Y) , Y (X1, Y1), (Хз, Үз))}. 


Under Но it can be shown that 


34/n(n — 1) 
/2(2n + 5) 


See, for example, Kendall [49], Randles and Wolfe [83] or Gibbons [32]. Approxi- 
mation is good for n > 8. 


TNO, 1). 
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13.5.3 Spearman’s Rank Correlation Coefficient 


Let (X1, Y1), (X2, Y2), ... , (Ха, Yn) be a sample from a bivariate population. In 
Section 7.3 we defined the sample correlation coefficient by 


(16) Ra MEO ЮИ) — 
[a (Xi See xy i-i (i == yd 


where 
ЕЕ п zu n 
Х=т! Ух and Y en» y. 
i-i izl 


If the sample values X1, X2, ... , X, and Y1, Yo, ... , Yn are each ranked from 1 
to n in increasing order of magnitude separately, and if the X's and Y's have contin- 
uous DFs, we get a unique set of rankings. The data will then reduce to n pairs of 
rankings. Let us write 


Р; = тапк(Х;) and 5; —rank(Yj); 


then R; and 5; є {1,2,... , n]. Also, 








(7) R= 5 "+7, 
1 1 
p.m AN _п+1 зз КЕ _п+1 
(18) К=п = j^ S=n 25e aL 
and 
(19) ук -R -}%- Зу = и 


Substituting in (16), we obtain 


оо) ж 1221106 -RG-$9 0DYTRS _ n+) 
п3—п n(n? — 1) n~1 





Writing D; = R; — S; = (Ri — R) — (S; — 5), we have 


x D? = Ук, — К)? + Ys – $)5— 23 (0 — RY(S; — S) 
i=l i=] i=l i=l 


1 
= gn’ = 0-25 — Ё)(5; — $), 
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and it follows that 


64 D? 


The statistic R defined in (20) and (21) is called Spearman’s rank correlation coeffi- 
cient (see also Example 4.5.2). 
From (20) we see that 


3(n + 1) 
(22) ER= у (Ers) - "em 


12 3(n +1 
фе ба 








Under Ho, the RVs X апа Y are independent, so that the ranks R; and $; are also 
independent. It follows that 





п+1 2 
En (RSi) = ERES = ( 2 ) 


and 


12 (n+1\? 341) 
23 urget Е ES 
ез HR zl 2 ) ME 








Thus we should reject Ho if the absolute value of R is large, that is, reject Ho if 
(24) |R| > Ra, 


where Pu,(|R| > Ra} < о. To compute А, we need the null distribution of R. 

For this purpose it is convenient to assume, without loss of generality, that Rj = i, 
= 1,2,...,n. Then Р; = i — Sj, i = 1,2,...,n. Under Но, X and Y being 

independent, the n! pairs (i, $;) of ranks are equally likely. It follows that 


(25) Pg {R = г} = (n9)! x (number of pairs for which R = г) 
n, 
= —, Say. 
n! 


Note that —1 < R < 1, and the extreme values can occur only when either the 
rankings match, that is, R; = S;, in which case R = 1, or R; =n + 1 — Si, in which 
case Ё = —1. Moreover, one need compute only one-half of the distribution, since 
it is symmetric about 0 (Problem 7). 

In the following example we compute the distribution of R for n — 3 and 4. The 
exact complete distribution of У D?, and hence R, forn < 10 has been tabulated 
by Kendall [49]. Table ST13 gives values of R for selected values of n and a. 
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Example 4. Let us first enumerate the null distribution of R for п = 3. This is 
done in the following table: 








n O Rytis 3040 
(51, 52, 53) dois n- n(n? — 1) n-1 
(1, 2, 3) 14 1.0 
(1, 3, 2) 13 0.5 
(2, 1, 3) 13 0.5 
Thus 
és r= 1.0, 
Pu (R = г) MEA 
H = ү} == 
0 2, r= —0.5, 
i, r = —1.0. 


Similarly, for n = 4 we have the following: 


(51, 52, 53, S4) yis r n, PmiR = ғ} 
i 
1 
1,2, 3,4 30 1 1 — 
f ) 24 
(1,3, 2, 4), (2, 1, 3, 4), (1, 2, 4, 3) 29 0.8 3 x 
1 
2,1,4,3 28 0.6 1 — 
( ) 24 
4 
(1, 3, 4, 2), (1, 4, 2, 3), (2, 3, 1, 4), (3, 1, 2, 4) 27 04 4 СУП 
2 
1, 4, 3, 2), (3, 2, 1,4 2 f 2 < 
( ), ( ) 6 0.2 F 
2 
25 0.0 2 lI 
24 





The last value is obtained from symmetry. 


Example 5. In Example 3 we see that 








_ 12x23 3х5 _ 


= РЧ 04. 
"= x15 3 s 


Since Рн,(1К| > 0.4} = 18/24 = 0.75, we cannot reject Ho at a = 0.05 or 
a = 0.10. 


For large samples it is possible to use a normal approximation. It can be shown 
(see, for example, Fraser [29, рр. 247—248]) that under Но the RV 
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п 
Z- (25 rs = an) n 5/2 


i=l 


or, equivalently, 
2 = Куп – 1 
has approximately a standard normal distribution. The approximation is good for 
n> 10. 
PROBLEMS 13.5 


1. A sample of 240 men was classified according to characteristics A and B. Char- 
acteristic A was subdivided into four classes, А, A2, Аз, and Ад, while B was 
subdivided into three classes, B1, B2, апа B3, with the following result: 








Is there evidence to support the theory that A and B are independent? 


2. The following data represent the blood types and ethnic groups of a sample of 
Iraqi citizens: 





Blood Type 
Ethnic Group Q A B AB 
Kurd 531 450 293 226 
Arab 174 150 133 36 
Jew 42 26 26 8 
Turkoman 47 49 22 10 
Ossetian 50 59 26 15 





Is there evidence to conclude that blood type is independent of ethnic group? 


3. In a public opinion poll, a random sample of 500 American adults across the 
country was asked the following question: "Do you believe that there was a con- 
certed effort to cover up the Watergate scandal? Answer yes, no, or no opinion." 
The responses according to political beliefs were as follows: 
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^ 





Political Response 

Affiliation Yes No No Opinion Total 

Republican 45 75 30 150 

Independent 85 45 20 150 

Democrat 140 30 30 200 
Total 270 150 80 500 


Test the hypothesis that attitude toward the Watergate cover-up is independent of 
political party affiliation. 


. Arandom sample of 100 families in Bowling Green, Ohio, showed the following 


distribution of home ownership by family income: 


Annual Income (dollars) 


Residential Less Than 30,000— 50,000 





Status 30,000 50,000 or Above 
Homeowner 10 15 30 
Renter 8 17 20 





Is home ownership in Bowling Green independent of family income? 


. In a flower show the judges agreed that five exhibits were outstanding, and these 


were numbered arbitrarily from 1 to 5. Three judges each arranged these five 
exhibits in order of merit, giving the following rankings: 


JudgeA: 5, 3, 1, 2, 4 
JudgeB: 3, 1, 5, 4 2 
JudgeC: 5, 2, 3, 1, 4 


Compute the average values of Spearman's rank correlation coefficient R and 
Kendall’s sample tau coefficient T from the three possible pairs of rankings. 


For the bivariate normally distributed RV (X, Y), show that т = 0 if and only if 
X and Y are independent. [Hint: Show that т = (2/z)sin^! p, where p is the 
correlation coefficient between X and Y.] 


. Show that the distribution of Spearman's rank correlation coefficient R is sym- 


metric about 0 under Но. 


. In Problem 5, test the null hypothesis that rankings of judge A and judge C are 


independent. Use both Kendall's tau and Spearman's rank correlation tests. 


. A random sample of 12 couples showed the following distribution of heights: 


644 NONPARAMETRIC STATISTICAL INFERENCE 





Height (in.) Height (in.) 
Couple Husband Wife | Couple Husband Wife 
1 80 72 7 74 68 
2 70 60 8 71 71 
3 73 76 9 63 61 
4 72 62 10 64 65 
5 62 63 11 68 66 
6 65 46 12 67 67 





(a) Compute T. 
(b) Compute R. 


(c) Test the hypothesis that the heights of husband and wife are independent, 
using T as well as R. In each case use the normal approximation. 


13.6 SOME APPLICATIONS OF ORDER STATISTICS 


In this section we consider some applications of order statistics. We are mainly in- 
terested in three applications: tolerance intervals for distributions, coverages, and 
confidence interval estimates for quantiles and location parameters. 


Definition 1. Let F be a continuous DF. A tolerance interval for F with tolerance 
coefficient у is a random interval such that the probability is у that this random 
interval covers at least a specific percentage (100p) of the distribution. 


Let X1, X2,... , Xn be a sample of size п from F, and let Хт), Хо), ... , X) 
be the corresponding set of order statistics. If the endpoints of the tolerance interval 
are two order statistics X), X(s), r < s, we have 


(1) P(P(Xo) < X < Хе} > ph = y. 


Since F is continuous, F(X) is U (0, 1), and we have 


(2) PX < X < Хо} = PIX < Xæ} — PIX < Хо} 
= Е(Х()) T F(X(»)) 
= Ug) – Up, 


where U(,), Ugs) are the order statistics from U (0, 1). Thus (1) reduces to 
(3) P{Us) — Ug) 2 p) = У. 


The statistic V = Ug; — Ug, 1 < r < s < n, is called the coverage of the 
interval (Xr), X (;)). More precisely, the differences Vi = F(X (ky) — F(Xq-) = 
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Uk) — Оа-1), fork = 1,2,...,n + 1, where Ug) = —oo and път) = 1, are 
called elementary coverages. 
Since the joint PDF of От), О), ... , Оки) is given by 


n, O < uy «uj <--> «ug, 


0, otherwise, 


femen | 


the joint PDF of Vi, V2, ... , V, is easily seen to be 


n!, у> 0, і = 1,2,...,п, Узи <1 


0, otherwise. 


(4) h(vt, 02,..., Un) = | 


Note that h is symmetric in its arguments. Consequently, V;'s are exchangeable RVs 
and the distribution of every sum of r, r « n, of these coverages is the same, and in 
particular, it is the distribution of Uç) = У =! Vj, namely, 


— 1 
n(" Jeta- т, O<u<1 
(5) ёт (и) = r—i 
0, otherwise. 
The common distribution of elementary coverages is 
gí(u)-—n(1—u)^l, O<u<i1, =0, otherwise. 
Thus EV; = 1/( + 1) and Yi EV; = r/(n + 1). This may be interpreted as 
follows: The order statistics Хү), X(2), ... , X(n) partition the area under the PDF in 
n + 1 parts such that each part has the same average (expected) area. 


The sum of any r successive elementary coverages V;+1, Vi41,... , Иг is called 
an r-coverage. Clearly, 


r 
(6) L Vi+j = Оа) — Оф, i+r<n, 
ja 


and, in particular, U(;) — Ug) = 3 +1 Vj. Since V's are exchangeable, it follows 
that 


d 
(7) Uis) — Ur) = О-у 
with PDF 


— 1 
8s—r(u) = n( : jest are, О<и<1. 
s—r-—1 


From (3), therefore, 
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1 s—r-—l n Р ied 
(8) y -f 8s—r (u) du = - (ie (1 — p) 


і=0 


where the last equality follows from (5.3.48). Given n, p, y it may not always Бе 
possible to find s — r to satisfy (8). 


Example 1. Lets =n andr = 1. Then 


n-2 
у=}, (ra = py! = 1 — р" – пр"! (1 — p). 


i=0 
If p = 0.8, n = 5,r = 1, then 
y = 1 — (0.8)? — 5(0.8)4(0.2) = 0.263. 


Thus the interval (Хт), X(5)) in this case defines a 26 percent tolerance interval for 
0.80 probability under the distribution (of X). 


Example 2. Let X1, X2, X3, X4, X5 be a sample from a continuous DF F. Let us 
find r and s, r < s, such that (Xq), X(s)) is a 90 percent tolerance interval for 0.50 
probability under F. We have 


0.90 = P fu > z} a С) [:)- 


It follows that if we choose s — r = 4, then y = 0.81; and if we choose s — r = 5, 
then у = 0.969. In this case we must settle for an interval with tolerance coefficient 
0.969, exceeding the desired value 0.90. 


In general, given p, 0 « p « 1, itis possible to choose a sufficiently large sample 
size n and a corresponding value of s — r such that with probability > y an interval 
of the form (X(r), X(s)) covers at least 100p percent of the distribution. If s — r is 
specified as a function of n, one chooses the smallest sample size n. 

Example 3. Let p = i and y = 0.75. Suppose that we want to choose the 
smallest sample size required such that (X (2), Xin) covers at least 75 percent of the 
distribution. Thus we want the smallest n to satisfy 


s E0QQ" 


From Table ST1 of binomial distributions we see that n = 14.” 
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We next consider the use of order statistics in constructing confidence intervals 
for population quantiles. Let X be an RV with a continuous DF F, 0 « p < 1. Then 
the quantile of order p satisfies 


(9) Fp) = р 


Let X4, X2,... , Xn be n independent observations on X. Then the number of 
Хг < зр is an RV that has a binomial distribution with parameters n and p. 
Similarly, the number of X;'s that are at least 3, has a binomial distribution with 
parameters n and 1 — p 

Let X(1), Хо), ... , Хи) be the set of order statistics for the sample. Then 


(10) P{X(r) < âp} = P {at least r ofthe X;'s < 3p} 


"Eoo 
Similarly, 


(11) P{X(s) = 3p} = P{at least n — s + 1 of the X;'s > 3p} 
= P (at most s — 1 of the X;'s < 3p} 


-E()ra- py. 


It follows from (10) and (11) that 
(12) Р(Х) < 3p € Xs} = Р(Х) = ap} — PIXQ) > åp} 
= P{Xe) < 3p} — 1+ P(X(5 > 3p] 


-X(ra- py" D )на- р)! —1 


= ү )ra- py ' 


It is easy to determine a confidence interval for 3, from (12) once the confidence 
level is given. In practice, one determines r and s such that s — r is as small as 
possible, subject to the condition that the level is 1 — o. 


Example 4. Suppose that we want a confidence interval for the median ( 4), 
based on a sample of size 7 with confidence level 0.90. It suffices to find r and 5, 


r < s, such that 
5—1 7 
7 1 
—} > 0.90. 
r6) (2) a 
i=r 
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By trial and error, using the probability distribution b(7, 1) we see that we can choose 
s=7,r =2orr = 1,s = 6; in either case s — r is minimum (= 5), and the 
confidence level is at least 0.92. 


Example 5. Let us compute the number of observations required for (X(1), Xm) 
to be a 0.95 level confidence interval for the median, that is, we want to find n such 
that 


P(X( € 31/2 € Хн} > 0.95. 


2 () (3) > 0.95. 


П 


It suffices to find и such that 


It follows from Table ST1 that n = 6. 


Finally, we consider applications of order statistics to constructing confidence in- 
tervals for a location parameter. For this purpose we use the method of test inversion 
discussed in Chapter 11. We first consider confidence estimation based on the sign 
test of location. 

Let X4, X2, ... , Xn be a random sample from a symmetric, continuous DF F(x — 
0) and suppose that we wish to find a confidence interval for 0. Let Rt (X — 00) = 
number of X;'s > 09 be the sign-test statistic for testing Ho: Ө = 69 against Hi: 0 X 
0o. Clearly, R* (X — 69) ~ b(n, 4) under Но. The sign-test rejects Но if 


(13) min(R*(X — 60), R*(69 — X)) xc 

for some integer c to be determined from the level of the test. Let = c-F 1. Then any 
value of Ө is acceptable provided that it is greater than the rth smallest observation 
and smaller than the rth largest observation, giving as the confidence interval 


(14) Xr) <0 < Хп+1-ғ): 


If we want level 1 — о to be associated with (14), we choose с so that the level of test 
(13) is a. 


Example 6. The following 12 observations come from a symmetric, continuous 
DF F(x — 6): 


—223, —380, —94, —179, 194, 25, —177, —274, —496, —507, —20, 122. 


We wish to obtain a 95 percent confidence interval for 6. Sign test rejects Ho if 
R*(X) > 9 or < 2atlevel 0.05. Thus 


P(3 « R*(X — 6) « 10) = 1 — 2(0.0193) = 0.9614 > 0.95. 
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It follows that a 95 percent confidence interval for Ө is given by (X(3), X(10) or 
(—380, 25). 


We next consider the Wilcoxon signed-ranks test of Но: Ө = 69 to construct a 
confidence interval for 0. The test statistic in this case is T+ = sum of ranks of 
positive (X; — 09)'s in the ordered | X; — 00[°5. From (13.3.4), 


T'— У) Дх+ху>2в] 


1«ixj«n 


(Xi + Xj) 
——— > 


— number of 09. 


1 
Let Т; = (X; + Xj)/2, 1 x i € j < n and order the N = C ) Ty's in 


increasing order of magnitude 
Ta) < To) <- < Т). 


Then using the argument that converts (13) to (14), we see that a confidence interval 
for Ө is given by 


(15) Tir) < 0 < Ton. 
Critical values c are taken from Table ST10. 


Example 7. For the data in Example 6, the Wilcoxon signed-rank test rejects 
Ho: Ө = 6 at level 0.05 if T* > 640r T+ < 14. Thus 


P(14 < T* (X — 00) < 64} > 0.95. 


It follows that a 95% confidence interval for 0 is given by [T(14), Teea] = [—336.5, —20]. 


PROBLEMS 13.6 
1. Find the smallest values of n such that the intervals (а) (X(1), X(;)), and (b) (X (2), 
X (n—1)) contain the median with probability > 0.90. 


2. Find the smallest sample size required such that (X (1), X(n)) covers at least 90 
percent of the distribution with probability > 0.98. 


3. Find the relation between n and p such that (X(1), X(n)) covers at least 100p 
percent of the distribution with probability > 1 — p. 


4. Given у, 8, po, pi with p > po, find the smallest п such that 


Р(Е(Х (у) — F(X) = ро) x y 
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and 
Р(Е(Х(у) — F(X) = pi} < 8. 


Find also s — r. [Hint: Use normal approximation to the binomial distribution. ] 


5. In Problem 4, find the smallest n and the associated value of s — r if y = 0.95, 
ô = 0.10, ру = 0.75, po = 0.50. 


6. Let X1, X2,... , X; be a random sample from a continuous DF F. Compute: 
(а) Р(Х) < 305 < X(»). 
(b) Р(Х) < 303 < X(5). 
(с) P(X@) < 308 < Х(6)). 
7. Let X1, X2, ... , X, be iid with common continuous DF F. 
(a) What is the distribution of 


F(Xq1) — F(X) + F(Xq@) — F(X) 


for2<i<j<n-—1? 
(b) What is the distribution of [F (Xm) — F(X) IF (Xa) — F(X)? 


13.7 ROBUSTNESS 


Most of the statistical inference problems treated in this book are parametric in na- 
ture. We have assumed that the functional form of the distribution being sampled is 
known except for a finite number of parameters. It is to be expected that any estimator 
or test of hypothesis concerning the unknown parameter constructed on this assump- 
tion will perform better than the corresponding nonparametric procedure, provided 
that the underlying assumptions are satisfied. It is therefore of interest to know how 
well the parametric optimal tests or estimators constructed for one population per- 
form when the basic assumptions are modified. If we can construct tests or estima- 
tors that perform well for a variety of distributions, for example, there would be little 
point in using the corresponding nonparametric method unless the assumptions are 
seriously violated. 

In practice, one makes many assumptions in parametric inference, and any one or 
all of these may be violated. Thus one seldom has accurate knowledge about the true 
underlying distribution. Similarly, the assumption of mutual independence or even 
identical distribution may not hold. Any test or estimator that performs well under 
modifications of underlying assumptions is usually referred to as robust. 

In this section we first consider the effect that slight variation in model assump- 
tions have on some common parametric estimators and tests of hypotheses. Next 
we consider some corresponding nonparametric competitors and show that they are 
quite robust. 
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13.7.1 Effect of Deviations from Model Assumptions on 
Some Parametric Procedures 


Let us first consider the effect of contamination on sample mean as an estimator of 
the population mean. The most commonly used estimator of the population mean и is 
the sample mean X. It has the property of unbiasedness for all populations with finite 
mean. For many parent populations (normal, Poisson, Bernoulli, gamma, etc.) it is a 
complete sufficient statistic and hence a UMVUE. Moreover, it is consistent and has 
asymptotic normal distribution whenever the conditions of the central limit theorem 
are satisfied. Nevertheless, the sample mean is affected by extreme observations, 
and a single observation that is either too large or too small may make X worthless 
as an estimator of и. Suppose, for example, that X1, X2, ... , X, is a sample from 
some normal population. Occasionally, something happens to the system, and a wild 
observation is obtained; that is, suppose that one is sampling from N (u, o?), Say, 
1000 percent of the time and from N (u, ko?), where k > 1, (1 — 0)100 percent 
of the time. Here both u and o? are unknown, and one wishes to estimate д. In this 
case one is really sampling from the density function 


(1) fx) = afo(x) + (1 — о) fi), 
where fo is the PDF of N (u, о?) and f, is the PDF of Л (и, ko). Clearly, 


Di Xi 


п 


(2) Х = 





is still unbiased for и. If o is nearly 1, there is no problem since the underlying 
distribution is nearly N (u, 07), and X is nearly the UMVUE of и with variance 
o? /n. If 1 — о is large (i.e., not nearly 0), then, since one is sampling from f, the 
variance of X, is o? with probability о and is ko? with probability 1 — a, and we 
have 


= 1 a? 
(3) Varo (X) = = var(X1) = le + (1 — ok]. 


Jf k(1 — о) is large, var, (X) is large and we see that even an occasional wild obser- 
vation makes X subject to a sizable error. The presence of an occasional observation 
from A/ (u, ka?) is frequently referred to as contamination. The problem is that we 
do not know, in practice, the distribution of the wild observations, and hence we do 
not know the PDF f. It is known that the sample median is a much better estimator 
than the mean in the presence of extreme values. In the contamination model dis- 
cussed above, if we use 21/2, the sample median of the X;'s, as an estimator of u 
(which is the population median), then for large n 


1 1 
(4) E(Z1/2 — uy = var(Zij2) © "ITI 
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(see Theorem 7.5.2 and Remark 7.5.7). Since 


Ди) = afo(u) + (Y о) Л (и) 





a 1 1-а 1 
reo = (e+ t) А 
су 2л су2лк Vk о/2л 
we have 


ло? 1 
5 Zia) & ——————————. 
(5) var(Z1/2) 2n (a [0 ар 


As К — oo, var(Z1/2) © zo?/2na?. If there is no contamination, a = 1 and 
var(Z1/2) ^: то? /2n. Also, 


zo? /2no? 1 


ло?/2п a? 


which will be close to 1 if œ is close to 1. Thus the estimator 21/2 will not be greatly 
affected by how large К is, that is, how wild the observations are. We have 





var(X) _2 (1-o)V 
aay = да+ -ok (a+ Ж ) — oo as k — oo. 


Indeed, var(X) — oo as k — oo, whereas var(Z1/2) > z0?/2na? as к — oo. One 
can check that when k = 9 and a ~ 0.915, the two variances are (approximately) 
equal. As k becomes larger than 9 or œ smaller than 0.915, 212 becomes a better 
estimator of и than X. 

There are other flaws as well. Suppose, for example, that X1, X2,... , X, isa 
sample from U (0,0), Ө > 0. Then both X and T(X) = (Xa) + X()/2, where 
Xa = min(X1,..., Xn), Хн) = max(X1,... , Xn), are unbiased for EX = 0/2. 
Also, varg (X) = var(X)/n = 02/[12n], and one can show that var(T) = 6?/[2(n + 
1)(n + 2)]. It follows that the efficiency of X relative to that of T is 


vatg(T) = 6n 


— = — <l if 2: 
wk) EDGE) € STT 


effo (X | T) = 





In fact, effo(X | T) — Oas n — oo, so that in sampling from a uniform parent X is 
much worse than Т, even for moderately large values of n. 

Let us next turn our attention to the estimation of standard deviation. Let X1, X», 
... , Xn be a sample from N (u, с2). Then the MLE of ø is 


n 317 1/2 
a Q;-X?P| _ (n-1 
(6) [ERIS -( E ) 5. 


i=l 
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Note that the lower bound for the variance of any unbiased estimator for с is a?/2n. 
Although б is not unbiased, the estimator 


регби = 0/2, _ [niri - 0/2] 
(7 5-45 ra “Үз Fay ^ 


is unbiased for o. Also, 








n—-1 [Ae = | = 
2 T(n/2) 


2 
o 1 
=— +0ļ|—=}. 
2n t (3) 


Thus the efficiency of 5; (relative to the estimator with least variance = в? /2п) is 


(8) wats) = 


o? /2n 1 


-——————«l 
var($)) 1+020(2/n) 





and — 1 аѕ и — oo. For small n, the efficiency of Sı is considerably smaller than 
1. Thus, for n = 2, eff(Xi) = 1/[2(т — 2)] = 0.438, and for n = 3, eff(S1) = 
z/[6(4 — x)] = 0.61. 

Yet another estimator of o is the sample mean deviation 


it = 
(9) $-- 1X: Х|. 
i-i 


r1< T 
E| (== Ух: – = ЕХ -ul = 
| 25 2, и) 2 |X; -u| =o 
and 


r1< л—2 
10 = У X;— 2 
(10) «( 2n 2| i и) Эн с 


If п is large enough so that X 7 u, we see that 53 = 4/ (7/2) 52 is nearly unbiased 
for o with variance [(2 — 2) /2п]о?. The efficiency of $5 is 


Note that 





о?(2п) 1 


eum ese 





For large n, the efficiency of $4 relative to 53 is 


var(S3) _ [Gt —2)/21]? ——— — ap 9 og 
va(S) ol n4 On) OQ/n) ^ 








654 NONPARAMETRIC STATISTICAL INFERENCE 


Now suppose that there is some contamination. As before, let us suppose that for 
a proportion a of the time we sample from N (и, c?), and for a proportion 1 — с of 
the time we get a wild observation from A/ (и, ko?), k > 1. Assuming that both p 
and о? are unknown, suppose that we wish to estimate с. In the notation used above, 
let 


f(x) = afo(x) + (1 — a) fix), 


where fo is the PDF of Л (и, с?) and fi is the PDF of A (и, ka). Let us see how 
even small contamination can make the maximum likelihood estimator 6 of o quite 
useless. б 

If 6 is the MLE of 0, and ф is a function of Ө, then g(@) is the MLE of o(0). In 
view of (7.5.7) we get 


X 1 К 
(11) Е(д — о)? x i5 — о?)?. 
Using Theorem 7.3.5, we see that 


pa — uà 
n 


(12) E(6? — о?)? х= 
(dropping the other two terms with n? and n? in the denominator), so that 


2 (d 
(13) Eê — о}? ~ c, (ma — 119). 


For the density f, we see that 


(14) ра = 3o^[a + К?(1 — a)] 
апа 
(15) u = оа +k(1 – о)]. 
It follows that 
а? 2 2 
(16) Efô - о}? © [е + (а — 1 - Io КА — о) |. 


If we are interested in the effect of very small contamination, a ^: 1 and 1—« ~ 0. 
Assuming that k(1 — o) ^ 0, we see that 


2 
(17) E(8 -o © PEU +220 оу) - 1) 


g? 3,2 
= wl! + 3k*(1 — o)]. 
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In the normal case, ид = 304 and и2 = c^, so that from (11) 


2 
^ 2.9 
E(ó ~ о} = On 
Thus we see that the mean square error due to a small contamination is now mul- 
tiplied by a factor [1 + ij — a)]. If, for example, К = 10, о = 0.99, then 
1+ 3201 — æ) = $. If = 10, œ = 0.98, then 1 + 3K?(1 — о) = 4, and so on. 

A quick comparison with 53 shows that although Sı (or even б) is a better esti- 
mator of с than 53 if there is no contamination, 53 becomes a much better estimator 
in the presence of contamination as k becomes large. 

Next we consider the effect of deviation from model assumptions on tests of hy- 
potheses. One of the most commonly used tests in statistics is Student's t-test for 
testing the mean of a normal population when the variance is unknown. Let X1, X2, 
... , Xn be a sample from some population with mean џи and finite variance т^. As 
usual, let X denote the sample mean, and 5?, the sample variance. If the population 
being sampled is normal, the t-test rejects Ho: џи = uo against H1: и Æ uo at level 
æ if [X — Hol > tr-1,0/2(s//n). И n is large, we replace £51,472 by the correspond- 
ing critical value, 20/2, under the standard normal law. If the sample does not come 
from a normal population, the statistic T = [ОХ — uo)/ S]4/n is no longer distributed 
as a t(n — 1) statistic. If, however, n is sufficiently large, we know that Т has an 
asymptotic normal distribution irrespective of the population being sampled, as long 
as it has a finite variance. Thus, for large n, the distribution of Т is independent of 
the form of the population, and the t-test is stable. The same considerations apply 
to testing the difference between two means when the two variances are equal. Al- 
though we assumed that n is sufficiently large for Slutsky's result (Theorem 6.2.15) 
to hold, empirical investigations have shown that the test based on Student's statistic 
is robust. Thus a significant value of г may not be interpreted to mean a departure 
from normality of the observations. Let us next consider the effect of departure from 
independence on the t-distribution. Suppose that the observations X1, X5, ... , X, 
have a multivariate normal distribution with E X; = и, var(X;) = o?, and p as the 
common correlation coefficient between any X; and X;,i # j. Then 


2 
(18) EX = и, and уа(Х)= —{1 + (n — 0p], 


and since X;'s are exchangeable it follows from Remark 7.3.1 that 
(19) ES? = o? (1 — p). 


For large n, the statistic /n(X — H0)/S will be asymptotically distributed as 
(0, 1+np/(1—p)), instead of N (0, 1). Under Но, о = O and T? = п(Х — шо)2/52 
is distributed as F(1, п — 1). Consider the ratio 


nE(X — yo)? _ o?[1 + (n — 1)0] Й пр 
ES? ^  e(-p 1—р` 








(20) 
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The ratio equals 1 if p = 0 but is > 0 for p > O and > оо as p — 1. It follows that 
a large value of T is likely to occur when р > 0 and is large, even though ио is the 
true value of the mean. Thus a significant value of t may be due to departure from 
independence, and the effect can be serious. 

Next, consider a test of the null hypothesis Ho: с = oo against H1: с # og. 
Under the usual normality assumptions on the observations X1, X2, ... , Xn, the test 
statistic used is 


20-089 УрО =)? 


(21) y -—3 i 


, 


which has a x2(n — 1) distribution under Ho. The usual test is to reject Но if 





(n — Ds? 
Q2) Vo = ee cea dus or Vo < X2 удар: 
0 
Let us suppose that X1, X2, ... , Xn are not normal. It follows from Corollary 2 of 
Theorem 7.3.5 that 
и, 3-п 4 

52) = = + ————и2, 
(23) var(S") = t жи? 
so that 
(24) ш a ees 

с? no^ n(n—1) 
Writing yo = (144/04) — 3, we have 

s? y2 2 

(25) we (5) 2 


when the X;’s are not normal, and 


52 2 


when the X;’s аге normal (y2 = 0). Now (n — 1) S2 zi — xy is the sum 
of n identically distributed but dependent RVs (Xj; — X }2, ј = 1,2,... , n. Using 
a version of the central limit theorem for dependent RVs (see, е.р., Cramér [16, р. 


365]), it follows that 
n-1\ IP s 
c S И 
( 2 ) о? 
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under Ho, is asymptotically Л/(0, 1 + (y2/2)), and not A/(0, 1) as under the normal 
theory. As a result, the size of the test based on the statistic Vo will be different from 
the stated level of significance if уз differs greatly from 0. It is clear that the effect 
of violation of the normality assumption can be quite serious on inferences about 
variances, and the chi-square test is not robust. 

In the discussion above we have used somewhat crude calculations to investigate 
the behavior of the most commonly used estimators and test statistics when one or 
more of the underlying assumptions are violated. Our purpose here was to indicate 
that some tests or estimators are robust, whereas others are not. The moral is clear: 
One should check carefully to see that the underlying assumptions are satisfied be- 
fore using parametric procedures. 


13.7.3 Some Robust Procedures 


Let X1, X2, ... , Xn bea random sample from a continuous PDF f (x — 0),0 є R, 

and assume that f is symmetric about 0. We shall be interested in estimation or tests 

of hypotheses concerning 0. Our objective is to find procedures that perform well for 

several different types of distributions but do not have to be optimal for any particular 

distribution. We will call such procedures robust. We first consider estimation of Ө. 
The estimators fall under one of the following three types: 


1. Estimators that are functions of R = (Ri, R2, ..., Rn), where Ry is the 
rank of Xj, are known as R-estimators. Hodges and Lehmann [41] devised 
a method of deriving such estimators from rank tests. These include the sam- 
ple median X (based on the sign test), and W = (med((X; + X;)/2, 1 «i < 
j < n) based on the Wilcoxon signed-rank test. 

2. Estimators of the form ) 7. а; Xj are called L-estimators, being linear com- 
binations of order statistics. This class includes the median, the mean, and the 
trimmed mean obtained by dropping a prespecified proportion of extreme ob- 
servations. 

3. Maximum likelihood type estimators obtained as solutions to certain equa- 
tions ee Y(X; — Ө) = 0 are called M-estimators. The function y(t) = 
— f'(t)/ f (t) gives MLEs. 


Definition 1. Let k = [na] be the largest integer < no, where 0 < a < 4. Then 
the estimator 


n—k 
= XG 
(27) Xa = aw 
у=” T 2k 


is called a trimmed mean. 
Two extreme examples of trimmed means аге the sample mean X (v = 0) and the 


median X when all except the central (n odd) or the two central (n even) observations 
are excluded. 
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Example 1. Consider the following sample of size 15 taken from a symmetric 
distribution. 


0.97 0.66 0.73 0.78 1.30 0.58 0.79 0.94 
0.52 0.52 0.83 1.25 1.47 0.96 0.71 


Suppose that a = 0.10. Then k = [na] = 1 and 


14 
у j=2*U) 


15—22 = 0.85. 


X0.10 = 
Here x = 0.867, medi <; <15 xj = xg) = 0.79. 


We limit this discussion to four estimators of location: the sample median, 
trimmed mean, sample mean, and Hodges-Lehmann estimator based on Wilcoxon 
signed-rank test. To compare the performance of two procedures A and B, we use a 
(large-sample) measure of relative efficiency due to Pitman. Pitman’s asymptotic rel- 
ative efficiency (ARE) of procedure B relative to procedure A is the limit of the ratio 
of sample sizes n4/np, where пд and пв are sample sizes needed for procedures 
A and B to perform equivalently with respect to a specified criterion. For example, 
suppose that [7,(4)) апа {Т„(в)у} are two sequences of estimators for ү (Ө) such that 


26 
Tra) ~ AN (vo. 4o) 





and 





2(6 
Такву ~ AN (vo. a) 


Suppose further that A and B perform equivalently if their asymptotic variances are 
the same, that is, 


c2(0) _ 0o30) 
n(A) n(B) 








Then 








n(A) o2 (8) 
— А 
n(B) 0209) 


Clearly, different performance measures may lead to different measures of ARE. 

Similarly, if procedures A and B lead to two sequences of tests, then ARE is the 
limiting ratio of the sample sizes needed by the tests to reach a certain power fo 
against the same alternative and at the same limiting level o. 
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Accordingly, let e(B, A) denote the ARE of B relative to A. If e(B, A) = 1, ѕау, 
then procedure A requires (approximately) half as many observations as procedure 
B. We will write ep (B, A) whenever necessary to indicate the dependence of ARE 
on the underlying DF F. 

For detailed discussion of Pitman efficiency we refer to Lehmann [59, pp. 371— 
380], Lehmann [61, Sec. 5.2], Randles and Wolfe [83, Chap. 5], Serfling [100, 
Chap. 10], and Zacks [120]. The expressions for AREs of median and the Hodges- 
Lehmann estimators of location parameter 0 with respect to the sample mean X are 


Q8) er (X, X) = 4o? f (0) 
and 

— oo 2 
(29) er(W, X) = 1202 | [ Poa] | 


where f is the PDF corresponding to F. To get e r(X , W) we use the fact that 


5 er(X, X) 

30 X, Ww ==, LE 
(30) er(X, W) POE 

E f(0) 

3[/5 £20) dx]? 
Bickel [4] showed that 

— айе. c 

(31) eg(Xa, X)  —, 
Og 
where 
2 _ 2. à1-e 2 

(32) „ЛЕ гет | [| t? f (0) dt vena] 


and 5, is the unique oth percentile of F. It is clear from (32) that no closed-form 
expression for ep (Xq, X) is possible for most DFs F. 
In the following table we give the AREs for some selected F. 
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ARE Computations for Selected F 














F e(X, X) e(W, X) e(X, W) 
] 1 1 1 
U(--, = zx Е 
‹ 2 5) 3 ! 3 
М0, 1) 2/1 = 0.637 3/л = 0.955 z 
Logistic, f(x) = e™ (1 +e) л?/12 = 0.822 1.10 0.748 
Double exponential, 
1 
F(x) = 5 aP) 2 1.5 ы 
C(O, 1) со oo ы 
3 
It can be shown that er (X, X) > 1 for all symmetric F, so Х is quite inefficient 


compared to X for U(— 1 3 2). Even for normal f, X would require 157 observations 
to achieve the same accuracy that X achieves with 100 observations. For heavier- 
tailed distributions, however, X provides more protection that X. 

The values of e(W, X), on the other hand, are quite high for most F and, in fact, 
er (W, X) > 0.864 for all symmetric F. Even for normal F one loses little (4.5%) 
in using W instead of X. Thus W is more robust as an estimator of Ө. 

A look at the values of e(X, W) shows that X is worse than W for distributions 
with light-tails but does slightly better than W for heavier-tailed F. 

Let us now compare the AREs of X4, X, and W. The following AREs for selected 
а are due to Bickel [4]. 


ARE Comparisons 
а = 0.01 a = 0.05 
Е e(Xa, X) e(QW, Xa) e(X,, X) e(W, Xa) 
Uniform 0.96 1.04 0.83 1.20 
Normal 0.995 0.96 0.97 0.985 
Double exponential 1.06 1.41 1.21 1.24 
Cauchy оо 6.72 оо 2.67 


We note that X, performs quite well compared to X. In fact, for normal distribu- 
tion the efficiency is quite close to 1, so there is little loss in using Xg. For heavier- 
tailed distributions, X, is preferable. For small values of о, it should be noted that 
X, does not differ much from X. Nevertheless, X, is more robust; it cannot do much 
worse than X but can do much better. Compared to the Hodges-Lehmann estimator, 
X» does not perform as well. It (W) provides better protection against outliers (heavy 
tails) and gives up little in the normal case. 

Finally, we consider testing Ho: Ө = 69 against Hj: 0 > 09. Recall that X1, X2, 

, Xn are iid with common continuous symmetric DF F(x — 0), 0 є R and PDF 
f(x — 0). Suppose that оў = var(X4) < oo. Let S denote the sign test based on 
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the statistic R+ (X) = Уу Дух, >00), W denote the Wilcoxon signed-rank test based 
on the statistic T+(X) = ^ic jc, Пх+ху>2%}, M denote the test based on the 
Z-statistic Z — Jn(X — 09)/ar, and t denote Student's t-test based on the statistic 
/n(X — 00)/5, where 52 is the sample variance. + 

First note that e(T, М) = 1. Next we note that ez (S, t) = er (X, X), er (W, t) = 
er(W, X), so that AREs are the same as given in (28), (29), and (30), and values of 
ARE given in the table for various F remain the same for corresponding tests. 

Similar remarks apply as for the case of estimation of @. The sign test is not as 
efficient as the Wilcoxon signed-rank test. But for heavier-tailed distributions such as 
Cauchy and double exponential, the sign test does better than the Wilcoxon signed- 
rank test. 


PROBLEMS 13.7 


1. Let (X1, X2, ... , Xn) be jointly normal with EX; = џи, var(X;) = o?, and 
cov(X;, Xj) = po? if |i — j| = 1, i A j, and = 0 otherwise. 


(a) Show that 
2 
var(X) = 2- | +20 (1 5 3] 
п п 


Е(52) = о? ( = 22) А 
п 


Show that the t-statistic ./n(X — и) /5 is asymptotically normally distributed 
with mean О and variance 1 + 2p. Conclude that the significance of t is over- 
estimated for positive values of o and underestimated for о < 0 in large 
samples. 


and 


(b 


— 


(с) For finite л, consider the statistic 


; n- и)? 


Compare the expected values of the numerator and the denominator of Т? 
and study the effect of р 5 0 to interpret significant t values. (Scheffé 
(99, p. 338]) 
2. Let X1, X2,... , Xn be a random sample from G(a, В), a > 0, B > 0. 
(a) Show that 


Зе (a + 2) 
Ша = ———. 


H2 = ap”, and p 


(b) Show that 
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2 
wn 74 2 (п — 1) (2+5). 
с а 


(с) Show that the large sample distribution of (п — 1)52/o? is normal. 

(d) Compare the large-sample test of Но: с = оо based on the asymptotic nor- 
mality of (n — 1)5?/o? with the large-sample test based on the same statistic 
when the observations are taken from a normal population. In particular, take 
a = 2. 

Let X1, X2, ... , Xm and Y), Y2,..., Y, be two independent random samples 

from populations with means и and 12, and variances o? and оў, respectively. 

Let X, Y be the two sample means, and St. $2 be the two sample variances. 

Write N = m+n, R = m/n, and 0 = с?/о2. The usual normal theory test 

of Ho: 41 — иә = ôo is the t-test based on the statistic 


T= Х-Ү—6б0 
© Sp(1/m + 1/n) 2" 


where 


g” DS + (n — 1)52 
4 т+п—2 


Under Hp, the statistic Т has a t-distribution with N — 2 d.f., provided that o? = 
03. Show that the asymptotic distribution of Т in the nonnormal case is M (0, (0 + 
R)(1 + А0) !) for large m and n. Thus if R = 1, T is asymptotically V (0, 1) as 
in the normal theory case assuming equal variances, even though the two samples 
come from nonnormal populations with unequal variances. Conclude that the test 
is robust in the case of large, equal sample sizes. (Scheffé [99, p. 339]) 


. Verify the ARE computations for F in the table above using the expressions of 


ARE in (28), (29), and (30). 


. Suppose that F is a G(a, В) RV. Show that 


ЗаГ2(20) 


eW, = аба — DAT GUI 


(Note that F is not symmetric.) 


. Suppose that F has PDF 


Г(т) 


=, —00 <x < оо, 
Г(1/2)Г((т — 1)/2)(1 + х2)" 


Јо) 


for т > 1. Compute e(X, X), e(W, X), and е(Х, W). (From Problem 3.2.3, 
E|X|* < oo ifk « m — 1) 


10, 


11. 


12. 


13. 


14. 


15. 


16. 


17. 
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Table ST1. Cumulative Binomial Probabilities, У (") Р“(1— р)", ғ = 0,1,2, 


«n-i 


0.01 


0.9801 
0.9999 
0.9703 
0.9997 
1.0000 
0.9606 
0.9994 
1.0000 


= 
Bi 


0.9510 
0.9990 
1.0000 


0.9415 
0.9986 
1.0000 


0.9321 
0.9980 
1.0000 


0.9227 
0.9973 
0.9999 
1.0000 


0.9135 
0.9965 
0.9999 
1.0000 
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0.05 


0.10 








0.20 


0.25 


0.30 


0.333 


0.40 


0.50 





0.9025 
0.9975 
0.8574 
0.9928 
0.9999 
0.8145 
0.9860 
0.9995 
1.0000 
0.7738 
0.9774 
0.9988 
0.9999 
1.0000 
0.7351 
0.9672 
0.9977 
0.9998 
0.9999 
1.0000 
0.6983 
0.9556 
0.9962 
0.9998 
1.0000 


0.6634 
0.9427 
0.9942 
0.9996 
1.0000 


0.6302 
0.9287 
0.9916 
0.9993 
0.9999 


0.8100 
0.9900 
0.7290 
0.9720 
0.9990 
0.6561 
0.9477 
0.9963 
0.9999 
0.5905 
0.9185 
0.9914 
0.9995 
1.0000 
0.5314 
0.8857 
0.9841 
0.9987 
0.9999 
1.0000 
0.4783 
0.6554 
0.8503 
0.9743 
0.9973 
0.9998 
1.0000 
0.4305 
0.8131 
0.9619 
0.9950 
0.9996 
1.0000 


0.3874 
0.7748 
0.9470 
0.9916 
0.9990 


0.6400 
0.9600 
0.5120 
0.8960 
0.9920 
0.4096 
0.8192 
0.9728 
0.9984 
0.3277 
0.7373 
0.9421 
0.9933 
0.9997 
0.2621 
0.6553 
0.9011 
0.9830 
0.9984 
0.9999 
0.2097 
0.5767 
0.8520 
0.9667 
0.9953 
0.9996 
1.0000 
0.1678 
0.5033 
0.7969 
0.9437 
0.9896 
0.9988 
1.0000 


0.1342 
0.4362 
0.7382 
0.9144 
0.9805 


0.5625 
0.9375 
0.4219 
0.8438 
0.9844 
0.3164 
0.7383 
0.9492 
0.9961 
0.2373 
0.6328 
0.8965 
0.9844 
0.9990 
0.1780 


0.5340 


0.8306 
0.9624 
0.9954 
0.9998 
0.1335 
0.4450 
0.7565 
0.9295 
0.9872 
0.9987 
0.9999 
0.1001 
0.3671 
0.6786 
0.8862 
0.9727 
0.9958 
0.9996 
1.0000 
0.0751 
0.3004 
0.6007 
0.8343 
0.9511 


0.4900 
0.9100 
0.3430 
0.7840 
0.9730 
0.2401 
0.6517 
0.9163 
0.9919 
0.1681 
0.5283 
0.8370 
0.9693 
0.9977 
0.1176 
0.4201 
0.7442 
0.9294 
0.9889 
0.9991 
0.0824 
0.3294 
0.6471 
0.8740 
0.9712 
0.9962 
0.9998 
0.0576 
0.2553 
0.5518 
0.8059 
0.9420 
0.9887 
0.9987 
0.9999 
0.0404 
0.1960 
0.4628 
0.7296 
0.9011 


0.4444 
0.8888 
0.2963 
0.7407 
0.9629 
0.1975 
0.5926 
0.8889 
0.9877 
0.1317 
0.4609 
0.7901 
0.9547 
0.9959 
0.0878 
0.3512 
0.6804 
0.8999 
0.9822 
0.9987 
0.0585 
0.2633 
0.5706 
0.8267 
0.9547 
0.9931 
0.9995 
0.0390 
0.1951 
0.4682 
0.7413 
0.9120 
0.9803 
0.9974 
0.9998 
0.0260 
0.1431 
0.3772 
0.6503 
0.8551 


0.3600 
0.8400 
0.2160 
0.6480 
0.9360 
0.1296 
0.4742 
0.8198 
0.9734 
0.0778 
0.3370 
0.6826 
0.9130 
0.9898 
0.0467 
0.2333 
0.5443 
0.8208 
0.9590 
0.9959 
0.0280 
0.1586 
0.4199 
0.7102 
0.9037 
0.9812 
0.9984 
0.0168 
0.1064 
0.3154 
0.5941 
0.8263 
0.9502 
0.9915 
0.9993 
0.0101 
0.0706 
0.2318 
0.4826 
0.7334 


0.2500 
0.7500 
0.1250 
0.5000 
0.8750 
0.0625 
0.3125 
0.6875 
0.9375 
0.0312 
0.1874 
0.4999 
0.8124 
0.9686 
0.0156 
0.1094 
0.3438 
0.6563 
0.8907 
0.9845 
0.0078 
0.0625 
0.2266 
0.5000 
0.7734 
0.9375 
0.9922 
0.0039 
0.0352 
0.1445 
0.3633 
0.6367 
0.8555 
0.9648 
0.9961 
0.0020 
0.0196 
0.0899 
0.2540 
0.5001 
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0.9044 
0.9958 
1.0000 


0.8954 
0.9948 
0.9998 
1.0000 


0.8864 
0.9938 
0.9998 
1.0000 
1.0000 
1.0000 


0.8775 
0.9928 
0.9997 
1.0000 


1.0000 


0.5987 
0.9138 
0.9884 
0.9989 
0.9999 
1.0000 


0.5688 
0.8981 
0.9848 
0.9984 
0.9999 
1.0000 


0.5404 
0.8816 
0.9804 
0.9978 
0.9998 
1.0000 


0.5134 
0.8746 
0.9755 
0.9969 
0.9997 
1.0000 


0.9998 
0.9999 
1.0000 


0.3487 
0.7361 
0.9298 
0.9872 
0.9984 
0.9999 
1.0000 


0.3138 
0.6974 
0.9104 
0.9815 
0.9972 
0.9997 
1.0000 


0.2824 
0.6590 
0.8892 
0.9744 
0.9957 
0.9995 
1.0000 


0.2542 
0.6214 
0.8661 
0.9659 
0.9936 
0.9991 


0.9970 
0.9998 
1.0000 


0.1074 
0.3758 
0.6778 
0.8791 
0.9672 
0.9936 
0.9991 
0.9999 
1.0000 


0.0859 
0.3221 
0.6174 
0.8389 
0.9496 
0.9884 
0.9981 
0.9998 
1.0000 


0.0687 
0.2749 
0.5584 
0.7946 
0.9806 
0.9961 
0.9994 
0.9999 
1.0000 


0.0550 
0.2337 
0.5017 
0.7473 
0.9009 
0.9700 


0.25 


0.30 


0.333 


0.40 


0.50 





0.9900 
0.9987 
0.9999 
1.0000 
0.0563 
0.2440 
0.5256 
0.7759 
0.9219 
0.9803 
0.9965 
0.9996 
1.0000 


0.0422 
0.1971 
0.4552 
0.7133 
0.8854 
0.9657 
0.9924 
0.9988 
0.9999 
1.0000 


0.0317 
0.1584 
0.3907 
0.6488 
0.8424 
0.9456 
0.9858 
0.9972 
0.9996 

10000 


0.0238 
0.1267 
0.3326 
0.5843 
0.7940 
0.9198 


0.9746 
0.9956 
0.9995 
0.9999 
0.0282 
0.1493 
0.3828 
0.6496 
0.8497 
0.9526 
0.9894 
0.9984 
0.9998 
1.0000 
0.0198 
0.1130 
0.3128 
0.5696 
0.7897 
0.9218 
0.9784 
0.9947 
0.9994 
0.9999 
1.0000 
0.0139 
0.0850 
0.2528 
0.4925 
0.7237 
0.8822 
0.9614 
0.9905 
0.9983 
0.9998 
1.0000 


0.0097 
0.0637 
0.2025 
0.4206 
0.6543 
0.8346 


0.9575 
0.9916 
0.9989 
0.9998 
0.0173 
0.1040 
0.2991 
0.5592 
0.7868 
0.9234 
0.9803 
0.9966 
0.9996 
0.9999 
0.0116 
0.0752 
0.2341 
0.4726 
0.7110 
0.8779 
0.9614 
0.9912 
0.9986 
0.9999 
1.0000 
0.0077 
0.0540 
0.1811 
0.3931 
0.6315 
0.8223 
0.9336 
0.9812 
0.9962 
0.9995 
0.9999 
1.0000 
0.0052 
0.0386 
0.1388 
0.3224 
0.5521 
0.7587 


0.9006 
0.9749 
0.9961 
0.9996 
0.0060 
0.0463 
0.1672 
0.3812 
0.6320 
0.8327 
0.9442 
0.9867 
0.9973 
0.9999 
0.0036 
0.0320 
0.1189 
0.2963 
0.5328 
0.7535 
0.9007 
0.9707 
0.9941 
0.9993 
1.0000 
0.0022 
0.0196 
0.0835 
0.2254 
0.4382 
0.6652 
0.8418 
0.9427 
0.9848 
0.9972 
0.9997 
1.0000 
0.0013 
0.0126 
0.0579 
0.1686 
0.3531 
0.5744 


0.7462 
0.9103 
0.9806 
0.9982 
0.0010 
0.0108 
0.0547 
0.1719 
0.3770 
0.6231 
0.8282 
0.9454 
0.9893 
0.9991 
0.0005 
0.0059 
0.0327 
0.1133 
0.2744 
0.5000 
0.7256 
0.8867 
0.9673 
0.9941 
0.9995 
0.0002 
0.0032 
0.0193 
0.0730 
0.1939 
0.3872 
0.6128 
0.8062 
0.9270 
0.9807 
0.9968 
0.9998 
0.0000 
0.0017 
0.0112 
0.0462 
0.1334 
0.2905 
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0.01 


0.8687 
0.9916 
0.9997 
1.0000 


SoMmMIDUNUAWNHKO 


a 
ә = 


15 0.8601 
0.9904 
0.9996 


1.0000 


же 
© со м) с\ї\л & шо Юю кє OW 


0.05 


0.4877 
0.8470 
0.9700 
0.9958 
0.9996 
1.0000 


0.4633 
0.8291 
0.9638 
0.9946 
0.9994 
1.0000 





0.10 


0.20 


р 
0.25 


0.30 





0.9999 
1.0000 


0.2288 
0.5847 
0.8416 
0.9559 
0.9908 
0.9986 
0.9998 
1.0000 


0.2059 
0.5491 
0.8160 
0.9444 
0.9873 
0.9978 
0.9997 
1.0000 


0.9930 
0.9988 


0.9998 
1.0000 


0.0440 
0.1979 
0.4480 
0.6982 
0.8702 
0.9562 
0.9884 
0.9976 
0.9996 
1.0000 


0.0352 
0.1672 
0.3980 
0.6482 
0.8358 
0.9390 
0.9820 
0.9958 
0.9992 
0.9999 
1.0000 


0.9757 
0.9944 


0.9990 
0.9999 
1.0000 


0.0178 
0.1010 
0.2812 
0.5214 
0.7416 
0.8884 
0.9618 
0.9897 
0.9979 
0.9997 
1.0000 


0.0134 
0.0802 
0.2361 
0.4613 
0.6865 
0.8516 
0.9434 
0.9827 
0.9958 
0.9992 
0.9999 
1.0000 


0.9376 
0.9818 


0.9960 
0.9994 
0.9999 
1.0000 


0.0068 
0.0475 
0.1608 
0.3552 
0.5842 
0.7805 
0.9067 
0.9686 
0.9917 
0.9984 
0.9998 
1.0000 


0.0048 
0.0353 
0.1268 
0.2969 
0.5255 
0.7216 
0.8689 
0.9500 
0.9848 


0.9964 


0.9993 
0.9999 
1.0000 
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0.333 


0.8965 
0.9654 


0.9912 
0.9984 
0.9998 
1.0000 


0.0034 
0.0274 
0.1054 
0.2612 
0.4755 
0.6898 
0.8506 
0.9424 
0.9826 
0.9960 
0.9993 
0.9999 
1.0000 


0.0023 
0.0194 
0.0794 
0.2092 
0.4041 
0.6184 
0.7970 
0.9118 
0.9692 
0.9915 
0.9982 
0.9997 
1.0000 


0.40 


0.7712 
0.9024 


0.9679 
0.9922 
0.9987 
0.9999 
1.0000 
0.0008 
0.0081 
0.0398 
0.1243 
0.2793 
0.4859 
0.6925 
0.8499 
0.9417 
0.9825 
0.9961 
0.9994 
0.9999 


0.0005 
0.0052 
0.0271 
0.0905 
0.2173 
0.4032 
0.6098 
0.7869 
0.9050 
0.9662 
0.9907 
0.9981 
0.9997 
1.0000 


0.50 


0.5000 
0.7095 


0.8666 
0.9539 
0.9888 
0.9983 
0.9999 
0.0000 
0.0009 
0.0065 
0.0287 
0.0898 
0.2120 
0.3953 
0.6048 
0.7880 
0.9102 
0.9713 
0.9936 
0.9991 
0.9999 
0.0000 
0.0005 
0.0037 
0.0176 
0.0592 
0.1509 
0.3036 
0.5000 
0.6964 
0.8491 
0.9408 
0.9824 
0.9963 
0.9995 
1.0000 





Source: For п = 2 through 10, adapted with permission from E. Parzen, Modern Probability Theory and 
Its Applications, Wiley, New York, 1962. For n = 11 through 15, adapted with permission from Tables of 
Cumulative Binomial Probability Distribution, Harvard University Press, Cambridge, Mass., 1955. 
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Table ST2. Tail Probability Under Standard Normal Distribution" 


Z 


0.0 
0.1 
0.2 
0.3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 
1.0 
1.1 
1.2 
13 
1.4 
1.5 
1.6 
1.7 
1.8 
1.9 
2.0 
2.1 
2.2 
2.3 
24 
2.5 
2.6 
2.7 
2.8 
2.9 
3.0 


0.00 


0.01 


0.02 


0.03 


0.04 





0.5000 
0.4602 
0.4207 
0.3821 
0.3446 
0.3085 
0.2743 
0.2420 
0.2119 
0.1841 
0.1587 
0.1357 
0.1151 
0.0968 
0.0808 
0.0668 
0.0548 
0.0446 
0.0359 


0.0287. 


0.0228 
0.0179 
0.0139 
0.0107 
0.0082 
0.0062 
0.0047 
0.0035 
0.0026 
0.0019 
0.0013 


0.4960 
0.4562 
0.4168 
0.3783 
0.3409 
0.3050 
0.2709 
0.2389 
0.2090 
0.1814 
0.1562 
0.1335 
0.1131 
0.0951 
0.0793 
0.0655 
0.0537 
0.0436 
0.0351 
0.0281 
0.0222 
0.0174 
0.0136 
0.0104 
0.0080 
0.0060 
0.0045 
0.0034 
0.0025 
0.0018 
0.0013 


0.4920 
0.4522 
0.4129 
0.3745 
0.3372 
0.3015 
0.2676 
0.2358 
0.2061 
0.1788 
0.1539 
0.1314 
0.1112 
0.0934 
0.0778 
0.0643 
0.0526 
0.0427 
0.0344 
0.0274 
0.0217 
0.0170 
0.0132 
0.0102 
0.0078 
0.0059 
0.0044 
0.0033 
0.0024 
0.0018 
0.0013 


0.4880 
0.4483 
0.4090 
0.3707 
0.3336 
0.2981 
0.2643 
0.2327 
0.2033 
0.1762 
0.1515 
0.1292 
0.1093 
0.0918 
0.0764 
0.0630 
0.0516 
0.0418 
0.0336 
0.0268 
0.0212 
0.0166 
0.0129 
0.0099 
0.0075 
0.0057 
0.0043 
0.0032 
0.0023 
0.0017 
0.0012 


0.4840 
0.4443 
0.4052 
0.3669 
0.3300 
0.2946 
0.2611 
0.2297 
0.2005 
0.1736 
0.1492 
0.1271 
0.1075 
0.0901 
0.0749 
0.0618 
0.0505 
0.0409 
0.0329 
0.0262 
0.0207 
0.0162 
0.0125 
0.0096 


0.0073 


0.0055 
0.0041 
0.0031 
0.0023 
0.0016 
0.0012 


0.05 


0.4801 
0.4404 
0.4013 
0.3632 
0.3264 
0.2912 
0.2578 
0.2266 
0.1977 
0.1711 
0.1469 
0.1251 
0.1056 
0.0885 
0.0735 
0.0606 
0.0495 
0.0401 
0.0322 
0.0256 
0.0202 
0.0158 
0.0122 
0.0094 
0.0017 
0.0054 
0.0040 
9.0030 
0.0022 
0.0016 
0.0011 


0.06 


0.4761 
0.4364 
0.3974 
0.3594 
0.3228 
0.2877 
0.2546 
0.2231 
0.1949 
0.1685 
0.1446 
0.1230 
0.1038 
0.0869 
0.0721 
0.0594 
0.0485 
0.0392 
0.0314 
0.0250 
0.0197 
0.0154 
0.0119 
0.0091 
0.0069 
0.0052 
0.0039 
0.0029 
0.0021 
0.0015 
0.0011 


0.07 


0.4721 
0.4325 
0.3936 
0.3557 
0.3192 
0.2843 
0.2514 
0.2206 
0.1922 
0.1660 
0.1423 
0.1210 
0.1020 
0.0853 
0.0708 
0.0582 
0.0475 
0.0384 
0.0307 
0.0244 
0.0192 
0.0150 
0.0116 
0.0089 
0.0068 
0.0051 
0.0038 
0.0028 
0.0021 
0.0015 
0.0011 


0.08 


0.4681 
0.4286 
0.3897 
0.3520 
0.3156 
0.2810 
0.2483 
0.2177 
0.1984 
0.1635 
0.1401 
0.1190 
0.1003 
0.0838 
0.0694 
0.0571 
0.0465 
0.0375 
0.0301 
0.0239 
0.0188 
0.0146 
0.0113 
0.0087 


0.0066. 


0.0049 
0.0037 
0.0027 
0.0020 
0.0014 
0.0010 
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0.09 


0.4641 
0.4247 
0.3859 
0.3483 
0.3121 
0.2776 
0.2451 
0.2148 
0.1867 
0.1611 
0.1379 
0.1170 
0.0985 
0.0823 
0.0681 
0.0559 
0.0455 
0.0367 
0.0294 
0.0233 
0.0183 
0.0143 
0.0110 
0.0084 
0.0064 
0.0048 
0.0036 
0.0026 
0.0019 
0.0014 
0.0010 


Source: Adapted with permission from Р. С. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, 


New York, 1971, p. 391. 


“This table gives the probability that the standard normal variable Z will exceed a given positive value z, 
that is, P(Z > za} = a. The probabilities for negative values of z are obtained by symmetry. 
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Table ST4. Student's t-Distribution® 





а 

п 0.10 0.05 0.025 0.01 0.005 
1 3.078 6.314 12.706 31.821 63.657 
2 1.886 2.920 4.303 6.965 9.925 
3 1.638 2.353 3.182 4.541 5.841 
4 1.533 2.132 2.776 3.747 4.604 
5 1.476 2.015 2.571 3.365 4.032 
6 1.440 1.943 2.447 3.143 3.707 
7 1.415 1.895 2.365 2.998 3.499 
8 1.397 1.860 2.306 2 896 3.355 
9 1.383 1.833 2.262 2.821 3.250 
10 1.372 1.812 2.228 2.764 3.169 
11 1.363 1.796 2.201 2718 3.106 
12 1.356 1.782 2.179 2.681 3.055 
13 1.350 1.771 2.160 2.650 3.012 
14 1.345 1.761 2.145 2.624 2.977 
15 1.341 1.753 2.131 2.602 2.947 
16 1.337 1.746 2.120 2.583 2.921 
17 1.333 1.740 2.110 2.567 2.898 
18 1.330 1.734 2.101 2.552 2.878 
19 1.328 1.729 2.093 2.539 2.861 
20 1.325 1.725 2.086 2.528 2.845 
21 1.323 1.721 2.080 2.518 2.831 
22 1.321 1.717 2.074 2.508 2.819 
23 1.319 1.714 2.069 2.500 2.807 
24 1.318 1.711 2.064 2.492 2.797 
25 1.316 1.708 2.060 2.485 2.787 
26 1.315 1.706 2.056 2.479 2.779 
27 1.314 1.703 2.052 2.473 2.771 
28 1.313 1.701 2.048 2.467 2.763 
29 1.311 1.699 2.045 2.462 2.756 
30 1.310 1.697 2.042 2.457 2.750 
40 1.303 1.684 2.021 2.423 2.704 
60 1.296 1.671 2.000 2.390 2.660 
120 1.289 1.658 1.980 2.358 2.617 
оо 1.282 1.645 1.960 2.326 2.576 





Source: Р. G. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, New York, 1971, р. 393. 
Reprinted by permission of John Wiley & Sons, Inc. 

“The first column lists the number of degrees of freedom (n). The headings of the other columns give 
probabilities (а) for ғ to exceed the entry value. Use symmetry for negative г values. 
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684 STATISTICAL TABLES 
Table ST6. Random Normal Numbers, и = 0 ando = 1 
1 2 3 4 5 6 7 8 9 10 

0.464 0.137 2.455 —0.323 —0.068 0.290 —0.288 1.298 0241 —0.957 
0.060 —2.526 —0.53] —0.194 0.543 —1.558 0.187 —1.190 0.022 0.525 
1.486 —0.354 —0.634 0.697 0.926 1.375 0.785 —0.963 —0.853 —1.865 
1.002 —0.472 1.279 3.521 0.571 —1851 0.194 1.192 —0.501 —0.273 
1.394 —0.555 0.046 0.321 2945 1974 —0.258 0.412 0.439 —0.035 
0.906 —0.513 —0.525 0.595 0.881 —0.934 1.579 0161 —1.885 0.371 
1.179 —1.055 0.007 0.769 0.971 0.712 1.090 —0.631 —0.255 —0.702 
—1.501 —0.488 —0.162 —0.136 1.033 0.203 0448 0.748 —0.423 —0432 
—0.690 0.756 —1.618 —0.345 —0.511 —2.051 —0.457 —0.218 0.857 —0.465 
1.372 0.225 0.378 0761 0.181 —0.736 0.960 —1.530 —0.260 0.120 
—0.482 1.678 —0.057 —1.229 —0.486 0.856 —0.491 —1.983 —2.830 —0.238 
—1.376 —0.150 1.356 —0.561 -—0.256 —0.212 0.219 0.779 0.953 —0.869 
—1.010 0.598 —0.918 1.598 0.065 0.415 —0.169 0.313 —0.973 —1.016 
—0.005 —0.899 0.012 —0.725 1.147 —0.121 1.096 0.481 —1.691 0.417 
1.393 1.163 —0.911 1.231 —0.199 —0.246 1.239 —2.574 —0.558 0.056 
—1.787 —0261 1.237 1.046 —0.508 —1.630 —0.146 —0.392 —0.627 0.561 
—0.105 —0.357 —1.384 0.360 —0.992 —0.116 —1.698 —2.832 —1.108 —2.357 
—1.339 1.827 —0.959 0.424 0.969 —1.141 —1.041 0.362 —1.726 1.956 
1.041 0.535 0.731 1.377 0.983 —1.330 1.620 —1.040 0.524 —0.281 
0.279 —2.056 0.717 —0.873 —1.096 —1.396 1.047 0.089 —0.573 0.932 
1.805 —2.008 —1.633 0.542 0.250 —0.166 0.032 0.079 0.471 —1.029 
—1.186 1.180 1.114 0.882 1265 —0.202 0.151 —0.376 —0.310 0.479 
0.658 —1.141 1151 —1.210 0.927 0425 0.290 —0.902 0.610 2.709 
—0.439 0.358 —1.939 0.891 —0.227 0.602 0.873 —0.437 —0.220 —0.057 
—1.399 —0.230 0.385 —0.649 —0.577 0.237 —0.289 0.513 0.738 —0.300 
0.199 0.208 —1.083 —0219 —0.291 1.221 1.119 0.004 —2.015 —0.594 
0.159 0272 —0.313 0.084 —2.828 —0.430 —0.792 —1.275 —0.623 —1.047 
2273 0.606 0.606 —0.747 0.247 1.291 0.063 —1.793 —0.699 —1.347 
0.0401 —0.307 0121 0790 —0.584 0.541 0.484 —0.986 0.481 0.996 
—1.132 —2.008 0921 0.145 0.446 —1.66] 1.045 —1.363 -—0.586 -—1.023 
0.768 0.079 —1.473 0.034 —2.127 0.665 0.084 —0.880 —0.579 0.551 
0.375 —1.658 —0.851 0.234 —0.656 0.340 —0.086 —0.158 —0.120 0.418 
—0.513 —0.344 0.210 —0.736 1.041 0.008 0.427 —0831 0.191 0.074 
0.292 —0.521 1.266 —1.206 —0.899 0.110 —0.528 —0.813 0.071 0.524 
1.026 2.990 —0.574 —0491 —1.114 1.297 —1.433 —1.345 —3.001 0.479 
—1.334 1.278 —0.568 —0.109 —0.515 -—0.566 2.923 0.500 0.359 0.326 
—0.287 —0.144 —0.254 0.574 —0451 —1.181 —1.190 —0.318 —0.094 1.114 
0.161 —0.886 —0.921 —0.509 1.410 —0.518 0.192 —0.432 1.501 1.068 
—1.346 0.193 —1.202 0.394 —1.045 0.843 0.942 1.045 0031 0.772 
1.250 —0.199 —0.288 1.810 1.378 0.584 1.216 0.733 0402 0.226 
0.600 —0.537 0.782 0.060 0.499 —0431 1.705 1.164 0.884 —0.298 
0.375 —1.941 0.247 —0.491 0.665 —0.135 —0.145 —0.498 0.457 1.064 
—1.20 0.489 —1.711 —1.186 0.754 —0.732 —0.066 1.006 —0.798 0.162 
—0.151 —0.243 —0.430 —0.762 0.298 1.049 1.810 2.885 —0.768 —0.129 
—0.309 0.531 0.416 —1.541 1.456 2.040 —0.124 0.196 0.023 —1.204 
0424 —0.444 0.593 0.993 —0.106 0.116 0.484 —1.272 1.066 1.097 
0.503 0.658 —1.127 —1.407 —1.579 —1.616 1.458 1.262 0.736 —0.916 
0.862 —0.885 —0.142 —0.504 0.532 1.381 0.022 —0281 —0.342 1.222 
0.235 —0.628 —0.023 —0.463 —0.899 —0.394 —0.538 1.707 —0.188 —1.153 
—0.853. 0.400 0777 0.833 0410 —0.349 —1.0904 0.580 1.395 1.298 


Source: From tables of the RAND Corporation, by permission. 
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Table ST7. Critical Values of the Kolmogorov-Smirnov One-Sample Test Statistic” 


One-Sided Test: 





а= 0.10 005 0.025 0.01 0.005 a= 0.10 
Two-Sided Test: 

a= 0.20 0.10 0.05 0.02 0.01 a= 0.20 

n=1 0.900 0.950 0.975 0.990 0.995 n=21 0226 

2 0.684 0.776 0.842 0.900 0.929 22 0.221 

3 0.565 0.636 0.708 0.785 0.829 23 0216 

4 0.493 0.565 0.624 0.689 0.734 24 0212 

5 0.447 0.509 0.563 0.627 0.669 25 0.208 

6 0.410 0.468 0.519 0.577 0.617 26 0.204 

7 0.381 0.436 0.483 0.538 0.576 27 0.200 

8 0.358 0.410 0.454 0.507 0.542 28 0.197 

9 0.339 0.387 0.430 0.480 0.513 29 0.193 

10 0.323 0.369 0.409 0.457 0.489 30 0.190 

11 0.308 0.352 0.391 0.437 0.468 31 0.187 

12 0.296 0.338 0.375 0.419 0.449 32 0.184 

13 0.285 0.325 0.361 0.404 0.432 33 0.182 

14 0.275 0.314 0.349 0.390 0.418 34 0.179 

15 0.266 0.304 0.338 0.377 0.404 35 0.177 

16 0.258 0.295 0.327 0.366 0.392 36 0.174 

17 0.250 0.286 0.318 0.355 0.381 37 0.172 

18 0.244 0.279 0.309 0.346 0.371 38 0.170 

19 0.237 0.271 0.301 0.337 0.361 39 0.168 

20 0.232 0.265 0.294 0.329 0.352 40 0.165 

Approximation 1.07 

for n > 40 


0.05 


0.10 


0.259 
0.253 
0.247 
0.242 
0.238 
0.233 
0.229 
0.225 
0.221 
0.218 
0.214 
0.211 
0.208 
0.205 
0.202 
0.199 
0.196 
0.194 
0.191 
0.189 
1.22 


0.025 


0.05 


0.287 
0.281 
0.275 
0.269 
0.264 
0.259 
0.254 
0.250 
0.246 
0.242 
0.238 
0.234 
0.231 
0.227 
0.224 
0.221 
0.218 
0.215 
0.213 
0.210 
1.36 


0.01 


0.02 


0.321 
0.314 
0.307 
0.301 
0.295 
0.290 
0.284 
0.279 
0.275 
0.270 
0.266 
0.262 
0.258 
0.254 
0.251 
0.247 
0.244 
0.241 
0.238 
0.235 
1.52 


0.005 


0.01 


0.344 
0.337 
0.330 
0.323 
0.317 
0.311 
0.305 
0.300 
0.295 
0.290 
0.285 
0.281 
0.277 
0.273 
0.269 
0.265 
0.262 
0.258 
0.255 
0.252 
1.63 


va vn vn у ут 





Source: Adapted by permission from Table 1 of Leslie Н. Miller, Table of percentage points of Kolmogrov 
statistics, J. Am. Stat. Assoc. 51 (1956), 111—121. 
“This table gives the values of D} „ and D, a for whicho > P(D{ > Dt } ande > P(D, > Dna} for 


some selected values of n and a. 
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Table ST8. Critical Values of the Kolmogorov-Smirnov Test Statistic for Two Samples 
of Equal Size* 





One-Sided Test: 





a= 0.10 005 0.025 0.01 0.005 а= 0.10 0.05 0025 0.01 0.005 
Two-Sided Test: 

a= 020 0.10 0.05 0.02 0.01 a= 020 010 005 002 0.01 

n=3 2/3 2/3 n=20 6/20 7/20 8/20 9/20 10/20 

4 34 34 3/4 21 621 7/21 8/21 9/21 10/21 

5 3/5 3/5 4/5 4/5 4/5 22 7/22 8/22 8/22 10/22 10/22 

6 3/6 4/6 4/6 5/6 56 23 7/23 823 9/23 10/23 10/23 

7 47 47 Sf SH 517 24 7/24 8/24 9/24 10/24 11/24 

8 4/8 48 5/8 5/8 68 25 7/25 8/25 9/25 10/25 11/25 

9 49 359 59 6/9 6/9 26 7/26 8/26 9/26 10/26 11/26 

10 4/10 5/10 6/10 6/10 7/10 27 7/27 8/27 9/27 11/27 11/27 

11 SAT 5/11 6/11 7/11 7/11 28 8/28 9/28 10/28 11/28 12/28 

12 5/12 5/12 612 7/12 7/12 29 8/29 9/29 10/29 11/29 12/29 

13 5/13 6/13 6/13 7/13 8/13 30 8/30 9/30 10/30 11/30 12/30 

14 5/14 6/14 7/14 7/14 8/14 31 8/31 9/31 10/31 11/31 12/31 

15 5/15 6/15 7/15 8/15 8/15 32 8/32 9/32 10/32 12/32 12/32 

16 6/16 6/16 7/16 8/16 9/16 34 8/34 10/34 11/34 12/34 13/34 

17 6/17 7/17 717 8/17 9/17 36 9/36 10/36 11/36 12/36 13/36 

18 6/18 7/18 8/18 9/18 9/18 38 9/38 10/38 11/38 13/38 14/38 

19 6/19 7/19 8/19 9/19 9/19 40 9/40 10/40 12/40 13/40 14/40 


Approximation 1.52 173 1.92 2.15 230 
for n > 40: Jn <n <n Sn 4n 


Source: Adapted by permission from Tables 2 and 3 of Z. W. Birnbaum and R. A. Hall, Small sample 
distributions for multisample statistics of the Smirnov type, Ann. Math. Stat. 31 (1960), 710—720. 

“This table gives the values of D$ „a and Р, п.о for whicha > P(Dz, > Рў „„\ anda > P(Ds > 
Dn,n,a} for some selected values of n and о. 
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Table ST9. Critical Values of the Kolmogorov-Smirnov Test Statistic for Two Samples 


a= 
az 


of Unequal Size" 
One-Sided Test: 
Two-Sided Test: 
М, = 1 № = 9 
10 
№ = 2 № = 3 
4 
5 
6 
7 
8 
9 
10 
М = 3 № = 4 
5 
6 
7 
8 
9 
10 
12 
М, =4 №, = 5 
6 
7 
8 
9 
10 
12 
16 
№ = 5 № = 6 
7 
8 
9 
10 
15 
20 


0.10 
0.20 


17/18 
9/10 
5/6 
3/4 
4/5 
5/6 
5/7 
3/4 
7/9 

7/10 
3/4 
2/3 
2/3 
2/3 
5/8 
2/3 
3/5 
712 
3/5 
712 

17/28 
5/8 
5/9 

11/20 
712 
9/16 
3/5 
4/7 

11/20 
5/9 
1/2 
8/15 
1/2 





0.05 0.025 0.01 
0.10 0.05 0.02 
AIS 
5/6 
6/7 
7/8 7/8 
8/9 8/9 
4/5 9/10 
3/4 
4/5 4/5 
2/3 5/6 
5/7 6/7 6/7 
3/4 3/4 78 
2/3 79 8/9 
710 4/5 9/10 
2/3 3/4 5/6 
3/4 4/5 4/5 
2/3 3/4 5/6 
5/7 3/4 6/7 
5/8 3/4 7/8 
2/3 3/4 7/9 
13/20 7A0 4/5 
2/3 2/3 3/4 
5/8 11/16 3/4 
2/3 2/3 5/6 
23/35 5/7 29/35 
5/8 27/40 4/5 
3/5 31/45 7/9 
3/5 THO 7/10 
3/5 2/3 11/15 
11/20 3/5 7/10 


0.005 


0.01 


8/9 
9/10 
11/12 


5/6 
6/7 
7/8 
8/9 
4/5 
5/6 
13/16 
5/6 
6/7 
4/5 
4/5 
4/5 
11/15 
3/4 
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Table ST9 (Continued) 
One-Sided Test: a= 0.10 0.05 0.025 0.01 0.005 
Two-Sided Test: a= 0.20 0.10 0.05 0.02 0.01 
Му =6 № =7 23/42 4/7 29/42 5/7 5/6 
8 1/2 712 2/3 3/4 3/4 
9 1/2 5/9 2/3 13/18 7/9 
10 1/2 17/30 19/30 7/0 11/15 
12 1/2 TAZ 7/12 2/3 3/4 
18 4/9 5/9 11/18 2/3 13/18 
24 11/24 1/2 7/12 5/8 2/3 
№=7 № =8 27/56 33/56 5/8 41/56 3/4 
9 31/63 5/9 40/63 5/7 47/63 
10 33/70 39/70 43/70 7/10 5/7 
14 3/7 1/2 4/7 9/14 5/7 
28 37 13/28 15/28 1728 9/14 
№ = 8 N,—9 4/9 13/24 5/8 2/3 3/4 
10 19/40 21/40 23/40 27/40 7/10 
12 11/24 1/2 7/12 5/8 2/3 
16 7/16 1/2 9/16 5/8 5/8 
32 13/32 7A6 1/2 9/16 19/32 
№ x9 N,=10 7/15 1/2 26/45 2/3 31/45 
12 49 1/2 5/9 11/18 2/3 
15 19/45 22/45 8/15 3/5 29/45 
18 7/18 4/9 1/2 5/9 11/18 
36 13/36 5/12 17/36 19/36 5/9 
М, = 10 N,-15 2/5 7/15 1/2 17/30 19/30 
20 2/5 9/20 1/2 11/20 3/5 
40 7/20 2/5 9/20 1/2 
М, = 12 № = 15 23/60 9/20 1/2 11/20 7/12 
16 3/8 716 23/48 13/24 712 
18 13/36 5/12 17/36 19/36 5/9 
20 11/30 5/12 7/15 31/60 17/30 
№ = 15 № = 20 7/20 2/5 13/30 29/60 31/60 
М, = 16 № = 20 27/80 31/80 17/40 19/40 41/80 

















- т+п т+п m+n m+n m+n 
Large sample. йу tao Rt qae] аа у аз 
approximation mn mn mn mn mn 
Source: Adapted by permission from F. J. Massey, Distribution table for the deviation between two sample 
cumulatives, Ann. Math. Stat. 23 (1952), 435-441. 
“This table gives the values of D+ , , and Dj, 5,4 for which œ > P(Dz, > рў nal ande > P{Dmn > 


m,n,a 
Dm,n.a} for some selected values of № = smaller sample size, N2 = larger sample size, and о. 
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Table ST10. Critical Values of the Wilcoxon Signed-Rank Test Statistic’ 

















а 
п 0.01 0.025 0.05 0.10 
3 6 6 6 6 
4 10 10 10 9 
5 15 15 14 12 
6 21 20 18 17 
7 27 25 24 22 
8 34 32 30 27 
9 41 39 36 34 
10 49 46 44 40 
11 58 55 52 48 
12 67 64 60 56 
13 78 73 69 64 
14 89 84 79 73 
15 100 94 89 83 
16 112 106 100 93 
17 125 118 111 104 
18 138 130 123 115 
19 152 143 136 127 


20 166 157 149 140 


Source: Adapted by permission from Table 1 of К. L. McCornack, Extended tables of the Wilcoxon 
matched pairs signed-rank statistics, J. Am. Stat. Assoc. 60 (1965), 864—871. 

“This table gives values of t4 for which P(T* > te} < a for selected values of n and о. Critical values 
in the lower tail may be obtained by symmetry from the equation ty = n(n + 1/2 — ta. 
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Table ST11. Critical Values of the Mann-Whitney—Wilcoxon Test Statistic’ 





n 


m a 2 3 4 5 6 7 8 9 10 
2 0.01 4 6 8 10 12 14 16 18 20 
0.025 4 6 8 10 12 14 15 17 19 

0.05 4 6 8 9 11 13 14 16 18 

0.10 4 5 7 8 10 12 13 15 16 

3 0.01 9 12 15 18 20 20 25 28 
0.025 9 12 14 16 19 21 24 26 

0.05 8 11 13 15 18 20 22 25 

0.10 7 10 12 14 16 18 21 23 

4 0.01 16 19 22 26 29 32 36 
0.025 15 18 21 24 27 31 34 

0.05 14 17 20 23 26 29 32 

0.10 12 15 18 21 24 26 29 

5 0.01 23 27 31 35 39 43 
0.025 22 26 29 33 37 41 

0.05 20 24 28 31 35 38 

0.10 19 22 26 29 32 36 

6 0.01 32 37 41 46 51 
0.025 30 35 39 43 48 

0.05 28 33 37 41 45 

0.10 26 30 34 38 42 

7 0.01 42. 48 53 . 58 
0.025 40 45 50 55 

0.05 37 42 47 52 

0.10 35 39 44 48 

8 0.01 54 60 66 
0.025 50 56 62 

0.05 48 53 59 

0.10 44 49 55 

9 0.01 66 73 
0.025 63 69 

0.05 59 65 

0.10 55 61 

10 0.01 80 
0.025 76 

0.05 72 

0.10 67 


Source: Adapted by permission from Table 1 of L. R. Verdooren, Extended tables of critical values for 
Wilcoxon's test statistic, Biometrika 50 (1963), 177-186, with the kind permission of Professor E. S. 
Pearson, the author, and tlie Biometrika Trustees. 

?'This table gives values of uy for which P(U > ua} < о for some selected values of m, n, and о. Critical 
values in the lower tail may be obtained by symmetry from the equation ија = mn — Ug. 
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Table ST12. Critical Points of Kendall’s Tau Test Statistic* 














a 
n 0.100 0.050 0.025 0.01 
3 3 3 3 3 
4 4 4 6 6 
5 6 6 8 8 
6 7 9 11 11 
7 9 1 13 15 
8 10 14 16 18 
9 12 16 18 22 
10 15 19 21 25 





Source: Adapted by permission from Table 1, р. 173, of M. С. Kendall, Rank Correlation Methods, 3rd 
ed., Charles Griffin, London, 1962. For values of n > 11, see W. J. Conover, Practical Nonparametric 
Statistics, Wiley, New York, 1971, p. 390. 

“This table gives the values of $4 for which P{S > Se} < о, where 5 = (5)Т, for some selected values 
of æ and n. Values in the lower tail may be obtained by symmetry, $1.4 = —Sy. 


Table ST13. Critical Values of Spearman’s Rank Correlation Statistic” 














o 

n 0.01 0.025 0.05 0.10 
3 1.000 1.000 1.000 1.000 
4 1.000 1.000 0.800 0.800 
5 0.900 0.900 0.800 0.700 
6 0.886 0.829 0.771 0.600 
7 0.857 0.750 0.679 0.536 
8 0.810 0.714 0.619 0.500 
9 0.767 0.667 0.583 0.467 
10 0.721 0.636 0.552 0.442 








Source: Adapted by permission from Table 2, pp. 174-175, of M. G. Kendall, Rank Correlation Methods, 
3rd ed., Charles Griffin, London, 1962. For values of n > 11, see W. J. Conover, Practical Nonparametric 
Statistics, Wiley, New York, 1971, p. 391. 

“This table gives the values of Ry for which P(R > Ra} < o for some selected values of n and o. Critical 
values in the lower tail may be obtained by symmetry, Rj. = —R,. 
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Answers to Selected Problems 


Problems 1.3 

1. (a) Yes; (b) yes; (с) no. 2. (а) Yes; (b) no; (с) no. 

6. (а) 0.9; (b) 0.05; (c) 0.95. 7. 1/16. 8. + $1n2 = 0.487. 
Problems 1.4 


3. ©) ee " (x) 4. 352146 5. (n-k + d 

6.1 — 4P5/45 8. жү e» S 

12. (a) 4/ ($) ® 94) / (2) (с) 13 (7) 1 a 
о0о) 09) 


© poa - 4-9] / (2): (2) ()e/ 3 


» (9009/6) 0 )»/6) 





Problems 1.5 
z £ 
3. a(pby' x ( i ) [pd — by. 4. p/(2— p). 
3777 7А Suny = "1 for large N 6.n=4 
А = 4 n= &. 
per; 4-9 n+2 
10. r/(r + g). 11. (a) 1/4; (b) 1/3. 12. 0.08. 
13. (a) 173/480; (b) 108/173, 15/173. 14. 0.0872. 


693 


694 ANSWERS TO SELECTED PROBLEMS 


Problems 1.6 


1.1/2 — р); (1 — р)/0 — p). 4. PA — р)2[3 ~ 7p(1 — p)]. 
12. For any two disjoint intervals /;, 2 С (а, b), (IDEC) = (b — aM(I, N D), where 
£(1) = length of interval /. 


13. (а) p Mn ifn = 1 КОРТ 
* п = n-2 n-2 2 : 
27 Q2 D ^ GO 2 D Gne 
(c) 12/362 (2) (3) (3) 2 09)" * (9) (5) 2 3" G) form = 2,3... 
Problems 2.2 
3. Yes; yes. 


4.9; (1,1, 1, 1,2), (1, 1, 1,2, 1), (1, 1,2, 1, D, (1,2, L, L, D, (2, 1, 1, 1, DE (6,6, 6, 6, 6)}; 
((6, 6, 6, 6, 6), (6, 6, 6, 6, 5), (6, 6, 6, 5, 6), (6, 6, 5, 6, 6), (6, 5, 6, 6, 6), (5, 6, 6, 6, 6)}. 
5. Yes; (1/4, 1/2) U (3/4, 1). 


Problems 2.3 


ES 0 1 2 3 
Р(Х =x)| 1/8 3/8 3/8 1/8 


Е(х) = 0, x «0, = 1/8, 0 <х «1; = 1/2, 1 <x «2; = 5/8, 2x <3; 
—1,x23. 
3. (a) Yes; (b) yes; (c) yes; yes. 





Problems 2.4 
1. — py -A~ p", Nn. 
; —; 1/x?; (d) e7*. 
2. (b) x x (c) 1/x^; (d) e 
3.Yes; Fo(x) = 0 x <0, 21 —e** —0xe?* forx > 0; Р(Х > 1) = 1 — Fel). 
4. Yes; FG) - 0x «6,2 1- ( Z) orao 
6. F(x) = ех [2 for x < 0, = 1 — e™*/2 for x > 0. 
8. (c), (d), and (f). 
9. Yes; (a)1/2,0 <x < 1,1/4for2 <x <4; (Ы) 1/(20), |x| < 8; 
(c)xe*,x>0; (0) (х – 1)/4for 1 <x < 3, and P(X = 3) = 1/2; 


(e) 2хет*, х > 0. 
10. If S(x) = 1 — F(x) = Р(Х > х), then S’(x) = — f(x). 





Problems 2.5 


2. x ®1/Х. 
4. Ө[1 — exp(— —2n0)] Д = Уе —6 arc COS у + е7276+6 arc cos »]. [у <1. 
0 exp{—6 arctan z)[(1 f —e 97-1, 2> 0, 
Ө exp(—76 — arctan z)[(1 + 22)(1 — e7?7)]^!, 2 < 0. 
10. fixi(y) = 2/3 їог0 < у < 1, = 1/3for1 < у < 2. 
12. (a) 0, у < 0; F(0) for —1 < y < 1, and 1 for y > 1; 
(b) = Oif y < —b, = F(—b) if y = —b, = F (y) if —b < у <Б, = 1 if y > b; 
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(с) = F( if y < —b, = F(—b) if —b < y < 0, = F(b) if0 < y < b, = F) 
if y >b. 


Problems 3.2 


3. ЕХ? —0  if2r « 2m — 1 is an odd integer, 
r m-r4l r(r«4 , R В 
~ if 2r < 2m — 1 is an even integer. 
Tones 


9. 3p = a(1 — v)/v where v = (1 — p)”. 
10. Binomial: оз = (9 — p)/./npq, œa = 3 + (1 — 6pq)/3npq 
Poisson: оз = A^!2, o4 = 3 + 1/А. 


Problems 3.3 


1. (b) e?^(e* — 0/0 — e?) (с) РП — gs)" А — qs) qt], s < 1/4. 
6. f(6s)/f (0), f (0е')/7(0). 


Problems 3.4 
Э а? с? x? 
3. For any o > 0 take P(X =x) = ao? (x= -£ = aiu z 0. 
af K? — иа c^[K? = 1р 
5. P | X = ————— | = — r 1<К 2, 
( caca) из + K*a* — 2K?o* SED 
2 $ 9 Ba — a* 
Р(Х? = К = —— r. 
‹ о) щл + K^a* — 2К?о* 
Problems 4.2 
1. No. 4. 1/6; 0. 7. Marginals negative binomial, so also conditionals. 


8. h(ybo) = 1(с? + x2)/( + x? + у?)%?, 

9. X ~ В(р\, р + рз); Y/O — х) ~ В(рз, рз). 

10. X ~ G(a, 1/8), Y ~ G(æ + у, 1/B), X/y ~ Bla, y), Y — x ~ Gly, 1/В). 
14. X x 7) =1—е77. 15. 1/24; 15/16. 17.16. 


Problems 4.3 


3. No, yes, no. 
10. = 1 —a/Qb) ifa < b, = b/(2a) if a > b. 
11. A/(. + и), 1/2. 


Problems 4.4 


2. (b) fv (vlu) = 1/Qu), |v] < u, u > 0. 

6. Р(Х =x, M = m) = л(1– xy"[1 — (1 —z)^*!J f x =m, —x?(1 — т)" 
if x < m. P(M =m) = 2л(1 — л)" — z(2 — x)(1 — л)", m > 0. 

7. fx(x) = Me kl, k Ex <k +1, k =0,1,2,.... 

11. fy(u) = 3и?/(1 + и)*, и > 0. 
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2 
13. (a) Fy,v (u, v) = [i — exp (-z5:)] (* 22 *) if u > 0, |у € 2/2, 


202 2 
= 1—ехр[1 — u*/(207)] if u > 0, v > 2/2, = 0 elsewhere; 
0102-1-02 
b ,v) = —e"—————. 
PEERU Мт Ta/ÐV2 
Problems 4.5 
ió 2+1 2+2 
.ЕХҮ=-————— + M. 3. X,Y)—0;X,Yd dent. 
e (E3910 1 ЗЕ C+D ы) om 


15. My,y (u, v) = a, — 2v)*! exp{u?/(1 — 2v)} for v < 1/2; p(U, V) = 0; no. 
18. рг у = (02 — о; 2) sin Ө cos @/./var(Z) var(W). 


EU? 
21. If U has pdf f, then EX” = EU"/(m+ 1) form > 0; р = 5 — 


nee c E REN 
> 8 var(U) + (ЕШ)? 


Problems 4.6 


a-p b-p b-p а-н 3 
Lutolf (=+) -f (=) уе(—=^ = )- o (=+) ] where Ф is the standard 
normal df. 
2. (a) 2(1 + X). 3. Е{Х|у} = ш + р (у — m). 4. E(var(Y|X)). 
6.4/9. 7. (а)1; (Ф) 1/4. 8. x*/(k +1), 1/0 + ЕЮ). 


Problems 4.7 


5. (a) (Хи) /В; ©. 


jel 


Problems 5.2 


— N 
sno (5) (Me PO -»-(y A) (py eo 2 Mt bean 


— т)! 
PY = М) = м} Р(х... ха] = у) = а 
i=1,...,j,x ху fori F j. 
9. PO = x) = qp' + рд", х > 1; PO = х) = pg + q'p xz 
P(Y, = х) = P(Y, = x) for n odd; = P(Y; = x) for n even. 


0O<x<y, 


Problems 5.3 
2. (a) P{ F(X) = жы J» d= p+} = (") p'(0— р)", х = 0,1,...,я 


la; 
эе(ў wp Data): 
22. ХУ ~ С(1, 0); (2/7) + 22)7',0 < z < oo. 
27. (а) 1/02; (с) = 01; <0,=a/tift > Ө; (d)(a/Byr-!. 
29. (b) 1/(2./m), 1/2. 
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Problems 5.4 


1. (a) ил = 4, ил = 15/4, p = —3/4; (YN (6- 2x, $); (c) 03191. 


4. BN (aui + b, сиз + d, а?о?, с2о2, p). 6. tan? 0 = ЕХ?/ЕҮ?. 7.02 = оў. 


Problems 6.2 


1. Мо. 2. Yes. 
3.Y,— Y ~ F(y) = 01у < 0, = 1— e^? if y 0. 
4. F(y) = Oif y < 0, = 1 е7? if y > 0. 
9.C(1,0). 12. No. 
13. (a) exp(—x~*), x > 0; EX* = T(1 — к/а), К < о; 
(b) exp(—e™*), -co < х < оо; M(t) = TU ~ Р), t < 1; 
(с) exp{—(—x)*}, х < 0; EX* = (—1¥T (1 + k/a), k > ~a. 
20. (а) Yes, по; (Б) yes, no. 


Problems 6.3 
3. Yes; A, = n(n + 1)0/2, B, = o n(n + 1)(2п + 1)/6. 


5. (а) М„(ї) ^ Oas n > оо, по; (Б) М, (г) diverges as и — оо; 
(с) yes; (4) уеѕ; (е) М, > e°’, no. 


Problems 6.4 
1. (а) No; (Б) по. 2. No. 3. Бого < 1/2. 7. (а) Yes; (Б) no. 


Problems 6.5 


4. Degenerate at В. 5. Degenerate at 0. 
6. For p > 0, N(O, ./p), and for p < 0, S,/n ate degenerate. 


Problems 6.6 


1. (6) No; (с) yes; (0) no. 
2. N (0,1). 3. М0, с2/82). 4.163 8. 0.0926; 1.92. 


Problems 7.2 


1. РХ = 0) = Р(Х = 1) = 1/8, P(X = 1/3) = Р(Х = 2/3) = 3/8, 
P(S? = 0) = 1/4, P(S? = 1/3) = 3/4. 


Q5 1 15 2 25 3 35 4 45 S 55 6 
 pGD|1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 





Problems 7.3 
1. (F(min(, у)) — F(x) FQ)}/n. 
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6. E(S*)F = c rn — D(n- 2)... (n 2k — 3), k > 1. 
9. (a) Р(Х = t) = e (nX)" /(tn)!, t = 0, 1/п,2/п,...; (0)C(L,0) 
(с) F(nm/2,2/n). 10. (b) 2/./an; 3 + 6/(an). 


= А? — 
11.0, 1,0, E(X, —0.5)*/(144n?). 12. var(S?) = 1 (^ + eA) > var(X). 
= 


Problems 7.4 


2. n(m + 8)/[m(n — 2)]; 2n? ((m + 8)? + (n — 2)(m + 25))/Em? (n 2)* (n — 4)]. 
n-1 


ied 
ar) п> 1;——=—(1+82)— [68 E п> 2. 
RN VEL 


11. 2m"? ^2 (n + met einem в ( ) —00 < Z < 00. 





m n 
252 
Problems 7.5 
1. (a) AN(GZ,, 40202) for u # 0, X /02 EA х?(1) for u = 0,02 = o?/n; 

(b) for u £0, 1/X ~ АМ(1/и, 02/15); for u = 0, о„/Х„ — AQ, 1); 

(с) for и # 0, In |X] ~ AN(n [14], 02/42); боги = 0, In([X|/o,) — In |АГ(0, 1); 


(d) AN(e", e"g?), 
2. c = 1/2 and VX ~ AN(AA, 1/4). 


Problems 7.6 


2 ү =. = 
Lt(n—1). 2.10т +п – 2). („=“) (а) /г(®—). 


Problems 7.7 








TEER -(n/24-1) 
yi +y us Бой. 


L „2үү—1/2 
3. or (1 — p^) Г + a -p9 


4. /n — AT ~t(n- 1). 


Problems 8.3 
7. fa,(x)/fo,(x). 9.No. 10. No. 


П.) Хо; (е) (Ж, 52); (в) (й Xi, П = хә); (hb) Xl Xo... Xo). 
1 1 
Problems 8.4 
n-i n+p-1 
"Cz eu) 
5 Yy 2 ro (ty 2 
r( 


2 LE) 2 (42) 
MM dite Га 
2 2 
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28% 


3.52 = 21552, уаг(52) = (21)? 207 < уаг(52) = 2, 4. №. 5. №. 


во (15) 0) озуп ®=()/(%) if0<t<s; 
=2/(") ifr =s,and CA) ifs+i<t<n. 


Gea сеа е 11. (а) NX/n; (b) no. 


12.1 = tx, 1 - (1— 9)" ift > to, and 1 ift < fo. 

13. (a) With: = =J xj bap б яп? 5 (ш fon n,t>s; (с) (1 —– 1/п)/; 
(ü-l/m-nee | 

14. With t = x), [^y (t) — (t — D^y( — D]/I" — (t — D") > 1. 


15. With t = Ух, () eta- g. 


Problems 8.5 


1. (a), (с), (d) Yes; (Ы) по. 2. 0.64761/n?. 
3. n~! sup(x?/[e* — 1). 5.26(1 —0)/n. 


x#0 
Problems 8.6 
2.8 = (n — 1)S?/(nX), ê = ee 3. = X,6? = (n — DS?/n. 
4.6 = Xa - ХХ? – 0, XE = Y" Xm B = (1 — Х)(Х - ХХ? — X d. 
5.й = In(X^ /[X2] 2, 6 Lom ye X? n. 
Problems 8.7 


1. (a) med(X;); (Б) Хау; (о) пу У Ха; (0) —n/ УІ Xj). 
2. (а) X/n; (b) 6, = 1/2 if X < 1/2, = X if 1/2 < X < 3/4, = 3/4 if > 3/4; 


Өө, ifX>0 ^ X —..X 
um | #X 20 ee ci Ne 
à = -% -JX +% }2, Х2 = Y: Х?/п; 


(6 = —^ - i if n1, n3 > 0; = any value in (0,1) if n; = n3 = 0; 
no mle if n, = 0, пз Æ 0; no mle if n; Æ 0, пз = 0; 
()6=—-}+4V14+4x2, (ó-x 

3. à = — d! (т/п). 

4. (а) = Xo, B = У(Х, -à)nm (ЫА = Р, (Ху > 1) =e а < 1, 
=l, а> 1; A=1ifé>1, = exp{(@ — 1)/B} ifà < 1. 

5. д =1/X. 6. й = X In X;/n, 6? = Y" (In X; — й)?/п. 

8. (а) Ñ = МЫ Хему —1; (b) Xan. 

9. Ё; = у" Хуп = Xi i 1,2,...,5, 6? = EEQG; — X) /(ns). 

11.4 — Х. 13.40) = (X/nY. 15. i = max(X, 0). 

16. pj = Xj/n, j =1,2,...,k-1. 
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Problems 8.8 


2. (а) (Ex +/+); (AY 3X. s. Хуп. 


6. (X + D(X x n/( + 2)(n + 3)]. 8. (a + п) max(a, X«)/(a +n — 1). 


Problems 8.9 


5. (c) (n + 2)[ (Ky /2))— FP — (Xa) **? ]/(Ga + DEG /2) **? — Kye I. 
10. (ХХ) T (n + )/ T(n + 2k). 


Problems 9.2 


1.0.019, 0.857. — 2.k = uo azo/ /n, 1 Ф (za — А0 n). 
5. exp( —2), exp( 2/0), 0 > 1. 


Problems 9.3 


1. ф(х) = lif x < C — V1 — o), = 0 otherwise. 

4. ф(х) = lif [x] — 1| > k. 5. ф(х) = lif xq) > c = 60 — In(a 7"). 
11. If 6o < 01, ф(х) = 1 if xq) > Opa”, and if Ө, < Oo, then ф(х) = 1 
if хау < 09(1 — gl/r)-!, 


12. ф(х) = lifx < /a/20r» 1 — /a/2. 


Problems 9.4 


1. (a), (b), (c), and (d) have MLR in X X;; (e) and (f) in [Ti X;. 
4. Yes. 5. Yes, yes. 


Problems 9.5 


1. $(x1, x2) = lif |x; — x2| > c, = 0 otherwise, c = „242. 
2. ф(х) = 1 if Ex; > k. Choose k from о = Py (У Xi > К). 


Problems 9.6 


3. ф(х) = 1 if (no. of x;'s > 0 - no. of x;’s < 0) > k. 


Problems 10.2 


2. Y =# of xy, x; in sample, Y < ci or Y > с. З.Х < сог> с). 
4. 52 > C1 OF < С). 5. (а) Xin) > No; (b) Xin > No Or « c. 

6. |X — 69/2| > c. 7.(а) ХХ <сог> со; (bX»c. 

11. Ха) > A – In(a)!/". 12. Хо > Өоа!/". 


Problems 10.3 


1. Reject ata = 0.05. 3. Do not reject Ho : pı = pz = рз = pa at 0.05 level. 
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4. Reject Ho at a = 0.05. 5. Reject at 0.10 but not at 0.05 level. 


7. Do not reject Но at a = 0.05. 8. Do not reject Ho ata = 0.05. 
10.U = 15.41. 12, P-value = 0.5447. 
Problems 10.4 


1. t = —4.3, reject Ho ata = 0.02. 2. t = 1.64, do not reject Hp. 
5.1 = 5.05. 6. Reject Ho ata = 0.05. 7. Reject Но. 8. Reject Ho. 


Problems 10.5 


1. Do not reject Ho : сп = oz at æ = 0.10. 
3. Do not reject Ho at a = 0.05. 4. Do not reject Ho. 


Problems 10.6 


2. (а) ф(х) = 1 if Ex; = 5, = 0.12 if Xx; = 4, = 0 otherwise; 
(b) minimax rule rejects Ho if Xx; = 4 or 5, and with probability 1/16 if Ух; = 3; 
(c) Bayes rule rejects Ho if Xx; > 2. 

3. Reject Ho if x < (1 — 1/1)1n2; 
BC) = P(Y < (n — 1))1In2), 8(2) = P(Z < (п — Din2) where Y ~ G(n, 1), and 
Z ^ G(n, 1/2). 


Problems 11.3 





1.(77.7,84.7). | 2.n=42. 7. (3 2m dua) 


2n,a/2 
9. QX/Q — Ay), 2X/(2 — 43), А5 — А2 = 4(1— а). 10. (a^ N]. 


In(t /a) 
П.п > mardo 


12. Choose k from о = (k + De^*. 13. X + zao / n. 

14. (ZX2/c,, EX? /c1) where f x? (y)dy = 1 — a, and ры yx2(y)dy = n(1 — a). 
15. Posterior B(n + æ, Ух; + B — п). 

16. A(ulx) = Æ expl - (и — х)?)[Ф(/п(1 — x)) — &(—/n(1 + x))], where Ф 


is standard normal df. 


Problems 11.4 


1 (Хау — xi, / Qn), Xm). 

2. (2n X /b, 2nX Ја), choose a, b from /? X, (u)du = 1 — æ, and à? x? (а) = b’ x (b), 
where x2(x) is the pdf of x?(v) rv. 

3. (X/(1 — b), X/(1 — a)), choose a, b from 1 — о = b? — a? and a(1 — а)? = b(1— by. 

4. n = [425 ,5/d?] + 1; n > (1/o) In(1/o). 


Problems 11.5 


1. (Ху, a7" " Xm). А 
2. (2E Х,/А, 25 X; /А.) where А, Az are solutions of A} fina (Ан) = А fona (А) and 
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P(1) =1—a, f, is x?(v) pdf. 
2 
3. (Ха, nd Ма, Ха). 5. (a^ Xo, Xo). 8. Yes. 


Problems 12.3 


nE(-/xd 


$ р 60-90 
4, Reject Ho : ao = a, [ONU ACA > с 
1 0 E(Y; —àp-àtjY? /(n—2) 


8. Normal equations By x + ĝi Xx!*! + jYx*? = DY xk, k —0,1,2. 
Reject Ho : fo = 0 if (lfol/ e/V EY; — Bo — Bix: — Bax?) > co where 


B; = Ec Y; and fy = Y — fix, В = E(x; — 3)(Y; -Y)/ EQ — х). 
10. (a) Bo = 0.28, 8| = 0.411; (b)t = 4.41, reject Ho. 


Problems 12.4 


2.F — 10.8. 3. Reject at о = 0.05 but not at a = 0.01. 
4. BSS = 28.57, WSS = 26, reject at a = 0.05 but not at 0.01. 
5. F = 56.45. 6. F = 0.87. 


Problems 12.5 


4. SS methods = 50, SS ability = 64.56, ESS = 25.44; reject Ho at a = 0.05, not at 0.01. 
5. Frariety = 24.00. 


Problems 12.6 


b,— 
А d am» 4j – У) 
2. Reject Но if —————————— > с 
EU EEE Or Yu 
4. SS, (machines) = 2.786, d.f. = 3; SSI = 73.476, d.f. = 6; 
SS; (machines) = 27.054, d.f. = 2; SSE = 41.333, d. f. = 24. 


5. Cities 3 22727 4.22 
Auto 3 3695.94 68.66 
Interactions 9 9.28 0.06 
Error 16 287.08 

Problems 13.2 


1. d is estimable of degree 1; (number of x;’s in A)/n. 
2.(aQ(mm)'XEX;EY; (Ы) Sh + SZ. _ 
3. (a) EX;Y;/m | (b) E(X; +Y; - X - YY/(n — 1). 


Problems 13.3 


3. Do not reject Но. 7. Reject Но. 10. Do not reject Но at 0.05 level. 
11. T+ = 133, do not reject Но. 
12. (2nd part) T* = 9, do not reject Ho at œ = 0.05. 
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Problems 13.4 


1. Do not reject Ho. 2. (a) Reject; (b) reject. 
3. U = 29, reject Но. 5. d = 1, do not reject Ho. 
7.1 = 313.5, = 3.73, reject; r = 10 or 12, do not reject at a = 0.05. 


Problems 13.5 


1. Reject Но at a = 0.05. 4. Do not reject Но ata = 0.05. 
9. (а)г = 1.2; (b)r = 0.62; (с) reject Но in each case. 


Problems 13.6 
1. (а) 5; (b)8. 3. p"? (n + p — np) x 1. 


4.n > (zi y A po(1 — po) — zia P1 (1 — p0Y /(pi — ро)”. 


Problems 13.7 


1. (с) Е{п(Х — pP HES? = 1 +2p(1 —2p/n)^'; ratio = 1 if p = 0, > 1for p > 0. 
2. Chi-square test based on (c) is not robust for departures from normality. 
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Contaminated normal, 651 
Contingency table, 634 
Continuity correction, 302 
Continuity theorem, 290 
Continuous type distributions, 50 
Convergence: 
a.s., 265, 281, 283 
in distribution — weak, 256 
in law, 256 
of MGFs, 289 
modes of, 256 
of moments, 257 
of PDFs, 259 
of РМЕ$, 258 
in probability, 259 
in rth mean, 263 
Convolution of DFs, 141 
Correlation, 151, 346 
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Correlation coefficient, 151, 346 
properties of, 151 

Countable additivity, 7 

Covariance, 312 
sample, 346 

Coverage, elementary, 645 
r-coverage, 645 
probability, 528 

Credible sets, 539 

Critical region, 456 


Decision function, 424 
Degenerate RV, 49, 180 
Degrees of freedom when pooling classes, 507 
Delta method, 335 
Density function, probability, 50, 107 
Design matrix, 565 
Dichotomous trials, 181 
Discordance, 635 
Discrete distributions, 180 
Discrete uniform distribution, 182 
Dispersion matrix = variance — covariance 
matrix, 247 
Distribution: 
conditional, 111 
conjugate prior, 43 
of a function of an RV, 57 
induced, 61 
a posteriori, 426 
a priori, 426 
of sample mean, 301, 320 
of sample median, 322 
of sample quantile, 338 
of sample range, 176 
Distribution function, 44, 45, 103 
continuity points of a, 44, 51 
of a continuous type RV, 50 
convolution, 141 
decomposition of a, 55 
discontinuity points of a, 44 
of a discrete type RV, 49 
of a function of an RV, 57 
of an RV, 45 
of multiple RVs, 103 
Domain of attraction, 294 


Efficiency of an estimator, 402 
relative, 402 

Empirical DF = sample DF, 310 

Equal likelihood, 1 

Equivalent RVs, 123 

Equivariant estimator, 356, 445 

Estimable function, 377, 599 

Estimate, 353 
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Estimable parameter, 599 


degree, 600, 605 
kernel, 600, 605 


Estimator, 353, 354 


equivariant, 356, 445 
Hodges—Lehmann, 657 
James-Stein, 451 

L-, 657 

least squares, 563 

M-, 657 

minimum risk equivariant, 446 
Pitman, 448-449, 451—452 
point, 354 

R-, 657 


Event, 3 


certain, 9 

elementary — simple, 3 

disjoint = mutually exclusive, 7, 35 
independent, 34 

null, 9 


Exchangeable random variables, 124, 156, 317 
Expectation, conditional, 165 


properties, 165 


Expected value — mean — mathematical 


expectation, 69 
of a function of RV, 141 
of product of RVs, 154 
of sum of RVs, 154 


Exponential distribution, 135, 215 


characterizations, 215-217 
memoryless property of, 216 
МОЕ 215 

moments, 215 


Exponential family, 251 


k-parameter, 253 
natural parameters of, 254 
one-parameter, 251 


Extreme value distribution, 233 


Factorial moments, 86 

Finite mixture density function, 235 
Finite population correction, 318 
Fisher information, 393 
Fisher-Irwin test, 502 

Fisher's Z-statistic, 333 

Fitting of distribution, binomial, 511 


Geometric, 511 
normal, 505 
Poisson, 506 


Fréchet, Cramér, and Rao inequality, 391 
Fréchet, Cramér, and Rao lower bound, 


391 
binomial, 393 
normal, 397 
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Fréchet, Cramér, and Rao inequality (cont.) 
one-parameter exponential family, 396 
Poisson, 394 

F-distribution: 

central, 330, 341 
moments of, 330 
noncentral, 332 
moments of, 332 
F-test(s), 518 
of general linear hypothesis, 566 
as generalized likelihood ratio test, 496, 566 
for testing equality of variances, 518 


Gamma distribution, 212 
bivariate, 117 
characterizations, 216 
MGF, 212 
moments, 212 
relation with Poisson, 218 
Gamma function, 211 
General linear hypothesis, 56! 
canonical form, 567 
estimation in, 562 
GLR test of, 566 
Generalized Likelihood ratio test, 491 
asymptotic distribution, 498 
F-test as, 496, 566 
for general linear hypothesis, 566 
for parameter of, binomial, 492 
for simple vs. simple hypothesis, 491 
bivariate normal, 499 
discrete uniform, 499 
exponential, 499 
normal, 499 
Generating functions, 85-86 
moment, 87 
probability, 86 
Geometric distribution, 86, 172, 187 
characterizations, 189, 204 
memoryless property of, 189 
MGF, 187 
moments, 187 
order statistics, 172 
PGF, 86 
Glivenko-Cantelli theorem, 311 
Goodness-of-fit problem, 504—505 


Hazard(- failure rate) function, 237 
Helmert orthogonal transformation, 342 
Hodges-Lehmann estimators, 657 
Hólder's inequality, 159 
Hypergeometric distribution, 191 
bivariate, 117 
mean and variance, 191 
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Hypothesis, tests of, 454 
alternative, 455 
composite, 455 
null, 455 
parametric, 455 
simple, 455 


Identically distributed RVs, 123 
Implication rule, 12 
Inadmissible decision rule, 440 
Independence and correlation, 151 
Independence of events, 34 
complete — mutual, 35 
pairwise, 35 
Independence of RVs, 119, 123 
complete — mutual, 122 
pairwise, 122 
Independent, identically distributed RVs, 123 
sequence of RVs, 123 
Indicator function, 41 
Induced distribution, 61 
Infinitely often, 281 
Interections, 590 
Invariance, of hypothesis testing problem, 482 
principle, 484 
Invariant: 
decision problem, 443 
family of distributions, 442 
function, 445, 482 
location, 445 
location-scale, 445 
loss function, 443 
maximal, 482 
scale, 445 
statistic, 445, 482 
Invariant, class of distributions, 442 
maximal, 447, 482 
tests, 482 
UMP tests, 483 
Inverse Gaussian PDF, 238 


James-Stein estimator, 451 
Joint: 
DF, 103, 105 
PDF, 107 
PMF, 106 
Jump, 48, 106 
Jump point, of a DF, 48, 106 


Kendall's sample tau, 637 
distribution of, 637 
generating function, 95 

Kendall's tau coefficient, 636 

Kendall's sample tau test, 637 
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Kernel, symmetric, 600, 605 
Kolmogorov’s, inequality, 284 
strong law of large numbers, 288 
Kolmogorov-Smirnov one sample statistic, 608 
for confidence bounds of DF, 613 
distribution, 609, 610—611 
Kolmogorov-Smirnov test: 
comparison with chi-square test, 612 
one-sample, 611 
two-sample, 627 
Kolmogorov--Smirnov two sample statistic, 627 
distribution, 628 
Kronecker lemma, 285 
Kurtosis, 85 


L-, M-, and R-estimators, 657 
Laplace(= double exponential) distribution, 93, 
234 
MGE, 94, 234 
Least square estimation, 563 
principle, 563 
restricted, 563 
Level of a test, 456 
L' Hospital rule, 296 
Likelihood: 
equal, 1 
equation, 410 
equivalent, 370 
function, 410 
Limit inferior, 12 
set, 12 
superior, 12 
Lindeberg central limit theorem, 298 
Lindeberg-Levy CLT, 296 
Lindeberg condition, 298 
Linear combinations of RVS, 154 
mean, 154 
variance, 155 
Linear dependence, 151 
Linear model, 562 
Linear regression model, 564, 569 
confidence intervals, 573 
estimation, 570 
problem, 564 
testing of hypotheses, 571—572 
Locally most powerful test, 487 
Location family, 204 
Location-scale family, 204 
Logistic distribution, 232 
Lognormal distribution, 91, 23] 
Loss function, 355, 424 
Lower bound for variance, Chapman, 
Robbins, and Kiefer inequality, 397 
Fréchet, Cramér and Rao inequality, 391 
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Lyapunov condition, 300 
Lyapunov inequality, 99 


Maclaurin expansion of an MGF, 88, 94 
Mann-Whitney statistic, 629 
moments, 606-67 
null distribution, 630 
Mann-Whitney-Wilcoxon test, 629 
Marginal: 
DF, 110 
PDF, 109 
PMF, 109 
Markov’s inequality, 96 
Maximal invariant statistic, 447, 482 
function of, 483 
Maximum likelihood estimation, principle of, 
410 
Maximum likelihood estimator, 410 
asymptotic normality, 419-420 
consistency, 419-420 
as a function of sufficient statistic, 415 
invariance property, 418 
Maximum likelihood estimation method applied 
to: 
Bernoulli, 413 
binomial, 422 
bivariate normal, 423 
Cauchy, 422 
discrete uniform, 411 
exponential, 418 
gamma, 415, 418 
geometric, 422 
hypergeometric, 412 
normal, 411 
Poisson, 421 
uniform, 412, 416 
Mean square error, 150, 354, 380 
Median, 82, 84 
Median test, 625 
Memoryless property: 
of exponential, 216 
of geometric, 189 
Method: 
of CF or MGF, 141 
of DFs, 128 
of transformations, 132 
Methods of finding confidence interval: 
Bayes, 538 
for large samples, 540 
pivot, 533 
test inversion, 535 
Method of moments, 406—407 
applied to: 
beta, 409 
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binomial, 408 
gamma, 409 
lognormal, 409 
normal, 409 
Poisson, 407 
uniform, 407 
Minimal sufficient statistic, 371 
for beta, 376 
for gamma, 376 
for geometric, 376 
for normal, 372, 440 
for Poisson, 376 
for uniform, 372, 375 
Minimax, estimator, 425 
principle, 425 
solution, 521 
Minimax estimation, for parameter of Bernoulli, 
425 
binomial, 436 
hypergeometric, 438 
Minimum mean square error estimator, 387 
for variance of normal, 387 
Minimum risk equivariant estimator, 446 
for location parameter, 448—449 
for scale parameter, 451-452 
Mixing proportions, 234 
Minkowski inequality, 160 
Mixture density function, 234 
Moment: 
about origin, 72 
absolute, 72 
central, 79 
condition, 75 
of conditional distribution, 165: 
of РЕ, 72 
factorial, 80 
of functions of multiple RVs, 149 
inequalities, 95 
lemma, 76 
non-existence of order, 77 
of sample covariance, 319 
of sample mean, 315 
of sample variance, 316 
Moment generating function, 87 
continuity theorem for, 290 
differentiation, 88 
existence, 89 
expansion, 88 
limiting, 289 
of linear combinations, 145 
and moments, 90 
of multiple RVs, 142 
of sample mean, 320 
series expansion, 88 


SUBJECT INDEX 


of sum of independent RVs, 145 
uniqueness, 88 
Moments, 69 
factorial, 86 
Monotone likelihood ratio, 472 
for hypergeometric, 475 
for one-parameter exponential family, 473 
UMP test for families with, 474 
for uniform, 473 
Most efficient estimator, asymptotically, 402 
as MLE, 417 
Most powerful test, 457 
for families with MLR, 474 
as a function of sufficient statistic, 466 
invariant, 483 
Neyman-Pearson, 464 
similar, 480 
unbiased, 479 
uniformly, 457 
Multidimentional RV — multiple RV, 102 
continuous, 107 
discrete, 106 
Multinomial coefficient, 25 
Multinomial distribution, 198 
МСЕ, 198 
moments, 199 
Multiple decision problem, 524 
Bayes solution, 524 
Multiple RV, 102 
continuous type, 107 
discrete type, 106 
functions of, 127 
Multiplication rule, 29 
Multivariate hypergeometric distribution, 200 
Multivariate negative binomial distribution, 201 
Multivariate normal, 245 
dispersion matrix, 247 


Natural parameters, 254 
Negative binomial (= Pascal or waiting time) 
distribution, 185 
bivariate, 117 
central term, 202 
mean and variance, 186 
МСЕ 186 
Negative hypergeometric distribution, 193 
mean and variance, 194 
Neyman-Pearson lemma, 464 
Neyman-Pearson lemma applied to: 
Bernoulli, 468 
normal, 470 
Noncentral, chi-square distribution, 326 
F-distribution, 332 
t-distribution, 329 
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Noncentrality parameter, chi-square, 326 
F-distribution, 332 
t-distribution, 329 
Noninformative prior, 432 
Nonparametric = distribution-free estimation, 599 
methods, 598 
Nonparametric unbiased estimation, 599 
of population mean, 601 
of population variance, 601 
of tail probability, 601 
Normal approximation: 
to binomial, 303 
to Poisson, 303 
Normal distribution = Gaussian law, 90, 226 
bivariate, 138, 170, 238 
characteristic function, 90 
characterizations, 229 
contaminated, 651, 654 
folded, 452 
as limit of beta, 298 
as limit of binomial, 303 
as limit of geometric, 297 
as limit of Poisson, 291, 303 
MGE, 226 
moments, 227 
multivariate, 245 
singular, 242 
as stable distribution, 294 
standard, 115, 225 
tail probability, 228 
truncated, 115 
Normal equations, 563 


Odds, 8 
Order statistic, 171 
is complete and sufficient, 599 
joint PDF, 173 
joint marginal PDF, 175 
kth, 171 
marginal PDF, 174 
uses, 644 
moments, 177 
Ordered samples, 22 
Orders of magnitude, o and O notation, 
290 


Parameter(s), of a distribution, 69, 204, 598 
estimable, 599 
location, 204 
location-scale, 204 
order, 69, 80 
scale, 204 
shape, 204 
space, 354 
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Parametric statistical hypothesis, 456 
alternative, 455 
composite, 455 
null, 455 
problem of testing, 454 
simple, 455 
Parametric statistical inference, 306 
Pareto distribution, 84, 231 
Partition, 368 
coarser, 370 
finer, 370 
minimal sufficient, 370 
reduction of a, 370 
sets, 369 
sub-, 370 
sufficient, 369 
Permutation, 23 
Pitman estimator of: 
location, 448-449 
scale, 451-452 
Pitman’s asymptotic relative efficiency, 658 
Pivot, 533 
Point estimator, 354 
Poisson DF, as incomplete gamma, 218 
Poisson distribution, 58, 84, 194 
central term, 208 
characterizations, 195—196 
coefficient of skewness, 85 
kurtosis, 85 
as limit of binomial, 202 
as limit of negative binomial, 203 
mean and variance, 194 
МСЕ, 88 
moments, 84 
PGF, 86 
truncated, 115 
Polya distribution, 192 
Pooled sample variance, 513 
Population, 306 
Population distribution, 307 
Posterior probability, 31 
Principle of: 
equivariance, 442, 445 
inclusion—exclusion, 10 
invariance, 484 
least squares, 563 
MLE, 410 
Prior probability, 31 
Probability, 7 
addition rule, 9 
axioms, 7 
conditional, 28 
continuity of, 14 
countable additivity of, 7 
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Probability (cont.) 
density function, 50 
distribution, 43 
equally likely assignment, 7 
on finite sample spaces, 21 
generating function, 86 
geometric, 14 
integral transformation, 208 
mass function, 48 
measure, 7 
monotone, 9 
multiplication rule, 29 
posterior and prior, 31 
principle of inclusion-exclusion, 10 
space, 2, 8 
subadditivity, 9 
subtractive, 9 
tail, 74 
total, 29 
uniform assignment of, 7 
Probability integral transformation, 208 
Problem: 
of location, 614 
of location and symmetry, 614 
of moments, 90 
P-value, 462 


Quadratic form, 238 
Quantile of order р = (100p)th percentile, 81 


Random, 14, 16 
Random experiment — statistical experiment, 3 
Random interval, 528 
coverage of, 528 
Random sample, 14, 23 
from a finite population, 23 
from a probability distribution, 14 
Random sampling, 307 
Random set, family of, 528 
Random variable(s), 41, 102 
bivariate, 106 
continuous type, 50, 107 
discrete type, 48, 106 
distribution of, 43 
degenerate, 49, 180 
equivalent, 123 
exchangeable, 124, 156, 317 
functions of a, 57 
multiple — multivariate, 102 
standardized, 80 
symmetric, 71 
symmetrized, 125 
truncated, 115 
uncorrelated, 151 
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Range, 176 
Rank correlation coefficient, 640 
Rayleigh distribution, 233 
Realization of a sample, 307 
Rectangular distribution, 207 
Regression: 
model, 564, 569 
coefficient, 346 
function, 574 
linear, 564, 569 
Regularity conditions of FCR inequality, 
391 
Risk function, 355, 424 
Robust estimator(s), 657 
Robust test(s), 660 
Robustness: 
of chi-square test, 657 
of sample mean as an estimator, 651 
of sample standard deviation as an 
estimator, 652 
of Student's t-test, 655 
Robust procedure, defined, 650, 657 
Rules of counting, 22 
Run, 632 
Run test, 632 


Sample, 306, 307 
correlation coefficient, 313 
covariance, 312 
DF, 310 
mean, 308 
median, 313 
distribution of, 322 
MGF, 312 
moments, 311 
ordered, 22 
point, 3 
quantile of order p, 313, 338 
random, 307 
realization of, 307 
regression coefficient, 351 
space, 3, 307 
statistic, 307, 311 
standard deviation, 308 
variance, 308 
Sampling: 
from a finite population, 23, 308 
from an infinite population, 308 
simple random, 308 
Sample space, 3 
continuous, 3 
discrete, 3 
finite, 3 
uncountable, 3 
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Sampling with and without replacement, 22-23, 
308 
Sampling from bivariate normal, 344 
distribution of sample correlation coefficient, 
347 
independence of sample mean vector and 
dispersion matrix, 345 
Sampling from univariate normal, 339 
distribution of sample variance, 340 
independence of X and 52, 340 
Scale family, 204 
Sequence of events, 12 
limit inferior, 12 
limit set, 12 
limit superior, 12 
nondecreasing, 12 
nonincreasing, 12 
Set function, 7 
Shortest-length confidence interval(s), 546 
for the mean of normal, 547—548 
for the parameter of exponential, 552 
for the parameter of uniform, 551 
for the variance of normal, 549 
Shrinkage estimator, 451 
c -field, 3 
choice of, 3 
generated by a class — smallest, 41 
Sign test, 614 
Similar tests, 480 
Single-sample problem(s), 608 
of fit, 608 
of location, 614 
and symmetry, 614 
Skewness, coefficient of, 85 
Slow variation, function of, 77 
Slutsky’s theorem, 270 
Spearman's rank correlation coefficient, 
639 
distribution, 640 
Stable distribution, 225, 294 
Standard deviation, 79 
Standard PDF, 204 
Standardized RV, 80 
Statistic of order k, 171 
marginal PDF, 174 
Stirling’s approximation, 202 
Stochastic ordering, 625 
Strong law of large numbers, 281 
Borel’s, 287 
Kolmogorov’s, 288 
Student’s f-distribution: 
central, 327 
bivariate, 352 
moments, 329 
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noncentral, 329 
moments, 329 
Student's t-statistic, 327 
Student’s t-test, 512, 513 
as generalized likelihood ratio test, 493 
for paired observations, 515 
robustness of, 655 
Substitution principle, 406 
estimator, 406 
Sufficient statistic, 359, 599 
factorization criterion, 361 
joint, 362 
Sufficient statistic for, Bernoulli, 362 
beta, 374 
discrete uniform, 363 
gamma, 374 
lognormal, 375 
normal, 363 
Poisson, 360 
uniform, 364 
Support, of a DF, 51, 106 
Survival function, 237 
Symmetric DF or RV, 71 
Symmetrization, 125 
Symmetrized RV, 125 
Symmetry, center of, 71 


Tail probabilities, 74 
Test(s), o-similar, 480 
chi-square, 500 
critical — rejection region, 456 
critical function, 456 
of hypothesis, 456 
F-,518 
invariant, 482 
level of significance, 456 
locally most powerful, 487 
most powerful, 455 
nonrandomized, 457 
one-tailed, 513 
power function, 457 
randomized, 457 
similar, 480 
size, 457 
statistic, 458 
Student's /, 512, 513 
two-tailed, 513 
unbiased, 479 
uniformly most powerful, 457 
Testing the hypothesis of: 
equality of several normal means, 561 
goodness-of-fit, 505, 608 
homogeneity, 507—508 
independence, 633, 635, 639 
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Tests of hypothesis: 

Bayes, 523 

GLR, 491 

minimax, 521 

Neyman—Pearson, 464 
Tests of hypothesis listed: 

chi-square tests, 500 

F-tests, 518 

t-tests, 512 
Tests of location, 614 

sign test, 614 

Wilcoxon signed-rank, 617 
Tolerance coefficient, 644 
Tolerance interval, 644 
Total probability rule, 29 
Transformation, 57, 128 

of continuous type, 60, 128 

of discrete type, 58, 128 

Helmert, 342 

Jacobian of, 133 

not one-to-one, 134 

one-to-one, 58, 133 
Triangular distribution, 53 
Trimmed mean, 657 
Trinomial distribution, 198 
Truncated distribution, 114 
Truncated RVs, 115 
Truncation, 114 
t-statistic, 327 
Two-point distribution, 180 
Two-sample problems, 624 
Types of error in testing hypotheses, 456 


Unbiased confidence interval, 553, 555-556 
general method of construction, 553 
for mean of normal, 554 
for parameter of exponential, 559 
for parameter of uniform, 559 
for variance of normal, 556 

Unbiased estimator, 377 
best linear, 379 
and complete sufficient statistic, 383 
LMV, 379 
and sufficient statistic, 382 
UMV, 379 

Unbiased estimation for parameter of: 
Bernoulli, 384, 387 
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bivariate normal, 389 
discrete uniform, 388 
exponential, 388 
hypergeometric, 388 
negative binomial, 387 
normal, 384 
Poisson, 382 

Unbiased test, 479 
for mean of normal, 481 
and similar test, 480 
UMP, 479 

Uncorrelated RVs, 151 

Uniform distribution, 59, 73, 207 
characterization, 209 
discrete, 182 
generating samples, 208 
МСЕ 207 
moments, 73, 207 
statistic of order k, 176, 221 
truncated, 115 

UMP test(s), 457, 479, 480, 483 
a-similar, 480 
invariant, 483 
unbiased, 479 

U-statistic, 600 
for estimating mean and variance, 601 
one sample, 600 
two sample, 605 


Variance, 79 
properties of, 79 
of sum of RVs, 155 
Variance stabilizing transformations, 336 


Weak law of large numbers, 274, 275, 278 
centering and norming constants, 274 
Weibull distribution, 233 
Welch approximate t-test, 514 
Wilcoxon score statistic, 629 
Wilcoxon signed-ranks test, 617 
Wilcoxon statistic, 617 
distribution, 618-619, 622 
generating function, 95 
moments, 622 
Winsorization, 116 
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Statistical Methods for Comparative Studies 
*ARTHANARI and DODGE - Mathematical Programming in Statistics 
ASMUSSEN : Applied Probability and Queues 
*BAILEY - The Elements of Stochastic Processes with Applications to the Natural 
Sciences 
BARNETT and LEWIS - Outliers in Statistical Data, Third Edition 
BARTHOLOMEW, FORBES, and McLEAN : Statistical Techniques for Manpower 
Planning, Second Edition © 
BASU and RIGDON : Statistical Methods for the Reliability of Repairable Systems 
BATES and WATTS - Nonlinear Regression Analysis and Its Applications 
BECHHOFER, SANTNER, and GOLDSMAN : Design and Analysis of Experiments for 
Statistical Selection, Screening, and Multiple Comparisons 
BELSLEY - Conditioning Diagnostics: Collinearity and Weak Data in Regression 
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BHAT - Elements of Applied Stochastic Processes, Second Edition 
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BIRKES and DODGE : Alternative Methods of Regression 
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BOULEAU : Numerical Methods for Stochastic Processes 
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Methods: Statistical Methods of Model Building 
CHATTERJEE and HADI : Sensitivity Analysis in Linear Regression 
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CLARKE and DISNEY : Probability and Random Processes: A First Course with 
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*COCHRAN and COX : Experimental Designs, Second Edition 
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*HAHN : Statistical Models in Engineering 
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HOCKING : Methods and Applications of Linear Models: Regression and the Analysis 
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HOGG and KLUGMAN : Loss Distributions 
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HØYLAND апа RAUSAND · System Reliability Theory: Models and Statistical Methods 
HUBERTY · Applied Discriminant Analysis 
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JOHNSON · Multivariate Statistical Simulation 
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Time-Dependent Systems with Practical Applications 
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MAGNUS and NEUDECKER : Matrix Differential Calculus with Applications in 
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MONTGOMERY and PECK : Introduction to Linear Regression Analysis, Second Edition 
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PORT : Theoretical Probability for Applications 

PUTERMAN : Markov Decision Processes: Discrete Stochastic Dynamic Programming 
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RUBINSTEIN · Simulation and the Monte Carlo Method 
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SCHUSS · Theory and Applications of Stochastic Differential Equations 
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*SEARLE · Linear Models 
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Edition 
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THOMPSON : Sampling 

THOMPSON : Simulation: A Modeler's Approach 

TIJMS · Stochastic Modeling and Analysis: A Computational Approach 

TIJMS · Stochastic Models: An Algorithmic Approach 

TITTERINGTON, SMITH, and MAKOV : Statistical Analysis of Finite Mixture 
Distributions 

UPTON and FINGLETON : Spatial Data Analysis by Example, Volume 1: Point 
Pattern and Quantitative Data 

UPTON and FINGLETON · Spatial Data Analysis by Example, Volume II: 
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VAN RIJCKEVORSEL and DE LEEUW - Component and Correspondence Analysis 

VIDAKOVIC : Statistical Modeling by Wavelets 

WEISBERG - Applied Linear Regression, Second Edition 

WESTFALL and YOUNG - Resampling-Based Multiple Testing: Examples and 
Methods for p-Value Adjustment 

WHITTLE - Systems in Stochastic Equilibrium 

*ZELLNER · An Introduction to Bayesian Inference in Econometrics 
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ARMITAGE and DAVID (editors) · Advances in Biometry 
BROWN and HOLLANDER : Statistics: A Biomedical Introduction 
CHOW and LIU - Design and Analysis of Clinical Trials: Concepts and Methodologies 
DUNN : Basic Statistics: A Primer for the Biomedical Sciences, Second Edition 
DUNN and CLARK : Applied Statistics: Analysis of Variance and Regression, Second 
Edition 
*ELANDT-JOHNSON and JOHNSON · Survival Models and Data Analysis 
*FLEISS · The Design and Analysis of Clinical Experiments 
FLEISS - Statistical Methods for Rates and Proportions, Second Edition 
FLEMING and HARRINGTON : Counting Processes and Survival Analysis 
KADANE : Bayesian Methods and Ethics in a Clinical Trial Design 
KALBFLEISCH and PRENTICE · The Statistical Analysis of Failure Time Data 
LACHIN : Biostatistical Methods: The Assessment of Relative Risks 
LANGE, RYAN, BILLARD, BRILLINGER, CONQUEST, and GREENHOUSE : 
Case Studies in Biometry 
LAWLESS : Statistical Models and Methods for Lifetime Data 
LEE - Statistical Methods for Survival Data Analysis, Second Edition 
MALLER and ZHOU : Survival Analysis with Long Term Survivors 
McNEIL · Epidemiological Research Methods 
McFADDEN - Management of Data in Clinical Trials 
*MILLER · Survival Analysis, Second Edition 
PIANTADOSI : Clinical Trials: A Methodologic Perspective 
WOODING : Planning Pharmaceutical Clinical Trials: Basic Statistical Principles 
WOOLSON : Statistical Methods for the Analysis of Biomedical Data 
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ANDERSON : An Introduction to Multivariate Statistical Analysis, Second Edition 

ANDERSON and LOYNES · The Teaching of Practical Statistics 

ARMITAGE and COLTON : Encyclopedia of Biostatistics: Volumes 1 to 6 with Index 
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Third Edition 
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BHATTACHARYA and JOHNSON : Statistical Concepts and Methods 

BILLINGSLEY : Probability and Measure, Second Edition 

ВОХ · R. A. Fisher, the Life of a Scientist 

BOX, HUNTER, and HUNTER : Statistics for Experimenters: An Introduction to 
Design, Data Analysis, and Model Building 

BOX and LUCERO · Statistical Control by Monitoring and Feedback Adjustment 

CHATTERJEE and PRICE · Regression Analysis by Example, Third Edition 

COOK and WEISBERG · Applied Regression Including Computing and Graphics 

COOK and WEISBERG : An Introduction to Regression Graphics 

COX - A Handbook of Introductory Statistical Methods 

DANIEL : Biostatistics: A Foundation for Analysis in the Health Sciences, Sixth Edition 

DILLON and GOLDSTEIN - Multivariate Analysis: Methods and Applications 
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DRAPER and SMITH - Applied Regression Analysis, Third Edition 
DUDEWICZ and MISHRA : Modern Mathematical Statistics 
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EVANS, HASTINGS, and PEACOCK : Statistical Distributions, Third Edition 
FISHER and VAN BELLE : Biostatistics: A Methodology for the Health Sciences 
FREEMAN and SMITH - Aspects of Uncertainty: A Tribute to D. V. Lindley 
GROSS and HARRIS · Fundamentals of Queueing Theory, Third Edition 
HALD : A History of Probability and Statistics and their Applications Before 1750 
HALD : A History of Mathematical Statistics from 1750 to 1930 
HELLER : MACSYMA for Statisticians 
HOEL : Introduction to Mathematical Statistics, Fifth Edition 
HOLLANDER and WOLFE · Nonparametric Statistical Methods, Second Edition 
HOSMER and LEMESHOW : Applied Logistic Regression, Second Edition 
HOSMER and LEMESHOW - Applied Survival Analysis: Regression Modeling of 
Time to Event Data 
JOHNSON апа BALAKRISHNAN · Advances in the Theory and Practice of Statistics: А 
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Seventeenth Century to the Present 
JUDGE, GRIFFITHS, HILL, LUTKEPOHL, and LEE · The Theory and Practice of 
Econometrics, Second Edition 
KHURI - Advanced Calculus with Applications in Statistics 
KOTZ and JOHNSON (editors) * Encyclopedia of Statistical Sciences: Volumes | to 9 
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KOTZ and JOHNSON (editors) * Encyclopedia of Statistical Sciences: Supplement 
Volume 
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LAMPERTI - Probability: A Survey of the Mathematical Theory, Second Edition 
LARSON : Introduction to Probability Theory and Statistical Inference, Third Edition 
LE - Applied Categorical Data Analysis 
LE · Applied Survival Analysis 
MALLOWS - Design, Data, and Analysis by Some Friends of Cuthbert Daniel 
MARDIA · The Art of Statistical Science: A Tribute to С. S. Watson 
MASON, GUNST, and HESS · Statistical Design and Analysis of Experiments with 
Applications to Engineering and Science 
McCULLOCH and SEARLE - Generalized, Linear, and Mixed Models 
MURRAY · X-STAT 2.0 Statistical Experimentation, Design Data Analysis, and 
Nonlinear Optimization 
PURI, VILAPLANA, and WERTZ : New Perspectives in Theoretical and Applied 
Statistics 
RENCHER : Linear Models in Statistics 
RENCHER : Methods of Multivariate Analysis 
RENCHER : Multivariate Statistical Inference with Applications 
ROSS · Introduction to Probability and Statistics for Engineers and Scientists 
ROHATGI : An Introduction to Probability Theory and Mathematical Statistics 
ROHATGI and SALEH · An Introduction to Probability and Statistics, Second Edition 
RYAN : Modern Regression Methods 
SCHOTT : Matrix Analysis for Statistics 
SEARLE · Matrix Algebra Useful for Statistics 
STYAN · The Collected Papers of T. W. Anderson: 1943-1985 
TIAO, BISGAARD, HILL, PENA, and STIGLER (editors) - Box on Quality and 
Discovery: with Design, Control, and Robustness 
TIERNEY · LISP-STAT: An Object-Oriented Environment for Statistical Computing 
and Dynamic Graphics 


*Now available in a lower priced paperback edition in the Wiley Classics Library. 


Texts, References, апа Pocketbooks (Continued) 
WONNACOTT and WONNACOTT : Econometrics, Second Edition 
WU and HAMADA · Experiments: Planning, Analysis, and Parameter Design 
Optimization 
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KHATTREE : Applied Descriptive Multivariate Statistics Using SAS Software 
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ROLSKI, SCHMIDLI, SCHMIDT, and TEUGELS · Stochastic Processes for Insurance 


and Finance 


WILEY SERIES IN PROBABILITY AND STATISTICS 
ESTABLISHED BY WALTER A. SHEWHART AND SAMUEL S. WILKS 


Editors 
Robert M. Groves, Graham Kalton, J. N. K. Rao, Norbert Schwarz, 


Christopher Skinner 
Survey Methodology Section 


BIEMER, GROVES, LYBERG, MATHIOWETZ, and SUDMAN : Measurement 


Errors in Surveys 
COCHRAN · Sampling Techniques, Third Edition 
COUPER, BAKER, BETHLEHEM, CLARK, MARTIN, NICHOLLS, and O'REILLY 
(editors) - Computer Assisted Survey Information Collection 
COX, BINDER, CHINNAPPA, CHRISTIANSON, COLLEDGE, and KOTT (editors) : 
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KORN and GRAUBARD : Analysis of Health Surveys 
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